The Impact of Device-Width Quantization on Digital Circuit Design Using FinFET Structures

EE 241 SPRING 2004 1 The Impact of Device-Width Quantization on Digital Circuit Design Using FinFET Structures Farhana Sheikh, Vidya Varadarajan {farhana, vidya}@eecs.berkeley.edu Abstract FinFET structures enable scaling to very short gate lengths beyond 10nm, however, device-width quantization has been identified as a possible technology disrupter for its widespread adoption. Prior work has shown that the effect is more pronounced for circuits such as latches which are sensitive to the β-ratio and that special tools and algorithms are required to migrate bulk CMOS designs to FinFET-based circuits. This paper investigates the effect of width granularity on functionality, performance, and power using three circuits that typify digital design: inverter network, SRAM cell, static SR flip-flop. We propose to employ HSPICE and Matlab scripts based on existing energy-delay optimization tools [6] to quantify the effects of width granularity. Index Terms FinFET, device-width quantization, scaling, sizing I. INTRODUCTION HORT-CHANNEL effects such as sub-threshold and Sgate-dielectric leakages in conventional CMOS devices are primary limiters for scaling. Novel device architectures will therefore, be necessary to continue reaping the benefits of scaling to very short gate lengths beyond 10nm. Double-gate CMOS (DGCMOS) devices offer an attractive alternative to other structures such as ultra-thin body (UTB) or conventional bulk CMOS in terms of performance, control of short-channel effects, and manufacturability [1]. The FinFET, the most popular realization of a double-gate device, offers a unique alternative to managing leakage currents without compromising performance. In [1] the authors show that for very low I OFF applications on the order of pico-amperes the V T for double-gate devices may be set to as much as 200mV lower than that of conventional single-gate devices and can result in as much as 60% more overdrive current for double-gate designs using a low supply voltage. This is a significant advantage over conventional planar devices at the same technology node. However, the FinFET presents a new complication for designers, namely, device-width quantization [2, 3, 4]. In FinFET technology, device widths are quantized into units of whole fins, owing to uniform SOI film thickness. In recently published literature, it has been shown that the device-width quantization problem is significantly more severe for circuits sensitive to the beta (β) ratio. These include SRAM cells, latches, and dynamic circuits. [3] Device-width quantization has only been recently identified as a possible technology disrupter for adoption of FinFET technology and very few studies have been done to investigate circuit design issues and possible solutions [2, 5, 3, 9]. In this project we propose to investigate transistor sizing issues resulting from use of FinFETs in the design of three types of simple digital circuits which are representative of the types of circuits commonly found in microprocessor design. The inverter network represents combinational circuits; the six transistor SRAM cell is representative of fundamental building blocks for memory circuits; and a static SR flip-flop is representative of sequential circuits. Each circuit s operation and reliability has a varying sensitivity to sizing and β-ratio. In addition, realization of tapered stacks can be a challenge using FinFETs. In this paper we quantify the impact on operation, performance, and power due to device-width quantization on each type of circuit through the use of HSPICE and Matlab scripts based on the tool described in [6]. Further details of our methodology are presented in Section I. Section II presents background material on FinFET structure, processing, and introduces the problem of device-width quantization. Section III provides a comprehensive review of publications that address design issues resulting from width granularity and possible solutions. In Section IV we detail the benchmark circuits and the various sizing issues that they present to the circuit designer. We highlight where device-width quantization may present issues for circuit designers and CAD tools. We summarize our paper in Section VII. II. FINFET TECHNOLOGY A. FinFET Structure and Process Flow A pictorial representation of a double-gate FET (DG-FET) is shown in Figure 1. The short channel effects in such a structure are well-controlled compared to single-gate devices due to control of the channel through two gates instead of one. The advantages of DG-FETs are best realized when the two gates are perfectly aligned with each other. The FinFET structure is one such successful implementation of a double-gate device. Other

EE 241 SPRING 2004 2 than the advantages in electrical performance and scalability, it also provides benefits in fabrication and manufacturability. The process flow and layout are reasonably compatible with the existing bulk CMOS process, making it more attractive for manufacturing [3]. Owing to its superior performance and fabrication benefits, its production may start as close as the 65nm technology node [5]. The issue of gate alignment and S/D alignment is very easily dealt with, in this implementation, by rotating the silicon film of a planar double gate structure to a vertical orientation. Forming the gate around the fin now makes the gates self-aligned along with self-alignment of the S/D regions. The structure and cross section of a typical FinFET are shown in Figure 2. The structure consists of a thin silicon fin and a gate line falling around it. The process flow is summarized in Figure 3. It involves the following basic steps: (i) Etching of Si fins out of the silicon layer of an SOI wafer, (ii) Gate formation around the fin, forming the front and back gates and (iii) spacer formation followed by S/D implantation.[3] As we can see from Figure 2 and Figure 4, the height of the silicon fin defines the transistor width. The thickness of the fin, on the other hand, defines the control of the back gate on the channel and hence the short channel behavior of the device. Silicon films on SOI wafers are used to define the fins and hence, the height of the fin is effectively constant for all transistors. Figure 1: Double-gate MOSFET Figure 4: Device-width quantization Figure 2: FinFET structure Figure 3: FinFET manufacturing process flow B. Device-Width Quantization Each fin provides 2H of device width, where H is the height of the fin. The size of each fin also determines the increments in device widths available to the circuit designer. In planar devices, the device width quanta are dictated by the grid step size in the design database employed. This relatively unconstrained selection of device width allows designers to choose appropriate ratios of N-MOSFET and P-MOSFET devices to achieve desired tradeoffs in performance, power, and robustness. Owing to the quantization constraint, it is much more difficult to achieve the required beta ratios in FinFETs. [2] III. PRIOR WORK AND MOTIVATION The device-width granularity issue has been identified only recently and very little study has been done. This problem can be tackled either at the circuit design level [5] or at the processing level [3, 9] or at the device physics level [12]. The possible approaches that seem promising are presented below. The ideas presented in subsections B and C have not been studied before from the quantization perspective, but can be considered as viable modifications to the conventional FinFET structure that have the possibility of overcoming or at least reducing the effect of device-width quantization. They share similar benefits with FinFET in terms of performance benefits, but present some other issues either in processing or layout.

EE 241 SPRING 2004 3 A. Design Level Optimization At the design level, novel optimization strategies are being worked upon to achieve the most optimal quantized device widths. Even though a clear solution to sizing is still not available, study has been done to evaluate the effect of device-width granularity on some performance metrics [5]. It is suspected that only circuits sensitive to transistor sizes are likely to be impacted in this technology. These circuits, as mentioned in a prior section, include SRAM cells, latches, dynamic logic and tapered stacks. A study of latch circuits shows an 8% deviation from an optimal performance point in single-gate technology. The work evaluates the effect of quantization on the power-delay product (PDP) and noise margin for a logic and general-purpose latch. The authors propose the use of a quantization error metric which is defined as half the maximum distance to the adjacent performance point normalized to its performance value, in the performance-size space. [5] B. Tri-Gate Transistor Design The tri-gate structure is an extension of the FinFET structure, to allow variable device widths. This structure has not been proposed as a solution to the quantization problem; however, it provides similar benefits of a double-gate device without adverse quantization effects. This device uses the top of the fin as a third gate controlling the channel [9, 12]. This relaxes the constraint on the fin thickness for the same level of short channel effects. The top of the fin can now be made wider or narrower to achieve the desired device width. Even though the tri-gate transistor appears to be a reasonable solution to the quantization problem, this device presents a few issues. The FinFET structures can be processed using spacer lithography, which is much simpler to implement than aggressive lithography techniques, which is required for the tri-gate structure. Also, this device performs best until the fin width is equal to the gate length. Thicker films than that (which could be required to achieve the desired device widths) lead to degradation in short channel effects and the device starts to resemble the UTB FET more than the double-gate FET. Given a choice of varying the fin thickness to one gate length, it would probably become challenging for software tools like FinGEN [3] to generate the exact number of fins and fin widths for each transistor in a big layout. Human intervention maybe required for final optimization. It would therefore be interesting to develop tools that can carry out optimization of circuits without trying to alter the device structure. Even though the most optimal circuits may not be generated, the performance loss may not be very heavy [5] compared to the effort required to introduce a different structure. C. Transistor Orientation Optimization Studies have been done to look at the effect of crystal orientation on the drive current of FinFETs. It has been found that NMOS device performs best for (100) orientation and PMOS device perform best for (110) orientations. The mobility enhancement for holes is of the order of 100-150%, which maps to about 20% increase in performance. On the other hand, using a (110) conduction path for NMOS devices reduces its performance by about 8%. [12] Even though a study has not been done specifically, by considering different orientations, one can probably tweak the sizing requirements by routing PMOS and NMOS devices in different orientations. Also effort can be made to account for the sizing by tweaking the ratio of current drive. More simulation studies need to be done to confidently claim the benefits of this approach. As one can clearly observe, even though this method can be studied, it is very difficult to implement it in practice as it involves challenges with routing the devices for an entire circuit. Different orientations present on the same layout may incur an area penalty. D. Layout modification One of the processing strategies for drawing the fins is by spacer lithography. This technique always produces fins in pairs, as shown in Figure 4. They need to be separated out if the fins need to be part of different transistors. The fin removal or separation is called a trim process and needs a separate mask. It has been found that by introducing the fin and trim masks in the layout intelligently, the quantization effect can be optimized. But this solution is typically not the best possible solution and requires further manual adjustments to the number of fins to get the most optimal circuit. IV. CIRCUITS Transistor sizing in conventional CMOS design is an efficient and powerful method that can be used to optimize power under delay constraints or optimize delay under power constraints. In addition, the reliability and operation of some types of circuits is highly dependent on the ratio of PMOS and NMOS transistor sizes. This is the case with latches, memory cells, and dynamic circuits. While FinFET technology provides digital designers significant advantages in the arena of leakage power management and performance in a sub-90nm regime, device-width quantization may have a significant impact on the delay, reliability and operation of circuits sensitive to β-ratios. Thus, an analysis of the impact of device-width quantization is necessary to evaluate whether it limits widespread adoption of FinFET devices in digital design. An inverter network, a six-transistor SRAM cell, and a clocked static SR flip-flop are chosen to represent three different types of circuits used in microprocessor design: combinational logic, memory, and sequential logic. Each circuit has varying degrees of sensitivity to transistor sizing. A. Inverter Network The inverter network is representative of combinational circuits whose operation and reliability is relatively independent of sizing. Such a network is shown in Figure 5. However, it is well-understood that sizing affects noise margins, performance,

EE 241 SPRING 2004 4 and power [11]. Therefore sizes for PMOS and NMOS transistors must be carefully selected to optimize the tradeoff between performance, reliability, and power. In [11] it is shown that the optimum β-ratio for an inverter is 2.4 when identical rising and falling delays are desired. This is the best operating point when designing for worst-case. In 1 2 3 C L = 64C g,1 Figure 5: Inverter network It can be shown that for each logic gate and combinational network there is an optimal size for NMOS and PMOS transistors that achieves minimum delay given a power constraint or minimum power given a delay constraint [7, 11]. The flexibility in choosing from a set of continuous transistor sizes for both NMOS and PMOS allows us to formulate the problem as a convex optimization problem and solve it exactly. However, when it is necessary to map the optimal transistor sizes to a discrete set, the problem becomes much more complicated and is known to be computationally hard [13]. Heuristics exist to find the optimal mapping and in the case of discrete gate sizing, technology mapping algorithms are used to map gates to a set of discretely-sized gates to achieve the best possible performance [14]. In the case of FinFET-based combinational circuits, there are two issues that must be explored: 1. How does one achieve the optimal β-ratio for each gate given a set of discrete sizes for each type of transistors? It is no longer straightforward to apply the method of logical effort. 2. How does one size gates such that optimal performance is achieved under the given constraints? The second problem has already been investigated thoroughly in literature as a technology mapping problem [14] where a given circuit must be mapped to a library of standard cells that include only a discrete number of different sizes for each gate in the library. The problem is known to be NP-Complete, however, good heuristics exist that solve the problem efficiently and give good results [14]. The first problem is of interest to us in this study and we explore our options through simulation and use of Matlab scripts as described in Section I. C L B. SRAM Cell SRAM cells are building blocks for random-access memories (RAM). The cells must be sized as small as is possible to achieve high densities. However, correct read operation of the cell is dependent on careful sizing of M1 and M5. Correct write operation is dependent on careful sizing of M4 and M6. As explained in [11] the critical operation is reading from the cell. If M5 is made minimum-size, then M1 must be made large enough to limit the voltage rise on Q so that the M3-M4 inverter does not inadvertently switch and accidentally write a 1 into the cell. The correct size of M1 can be obtained via simulation or by solving the following constraint for a given voltage ripple, v: V v = Equation 1: Constraint on size of M1 and M5 For a FinFET SRAM cell, the device-width quantization limits the size of M1 to multiples of the fin height which would mean that M1 would either be larger or smaller than optimum for a single-gate design resulting in sub-optimal operation. BL DSATn M5 2 2 + CR( VDD VTn ) VDSATn (1 + CR) + CR ( VDD V ) CR V DD M2 M4 Q Q M1 W1 CR = W WORDLINE - WL M3 Figure 6: SRAM cell C. Static SR Flip-Flop The static SR flip-flop is very similar to the SRAM memory cell; however, in this case extra NMOS transistors are added for clocked inputs which are used to drive the flip-flop from one state to the next. As explained in [11], the sizing of the transistors M5, M6, M7, and M8 is critical for correct operation once sizes for M1-M2 and M3-M4 inverters are chosen. The switching threshold for the ratioed inverter (M5-M6)-M2 must be below the switching threshold of the M3-M4 inverter to allow the flip-flop to switch from Q=0 to Q=1 state. The sizes for the transistors can be determined through simulation or by solving a constraint equation similar to Equation 1 where M5 and M6 can be taken together to form a single transistor with a length twice the length of the individual transistors. The device-width quantization issues are similar to ones 5 L1 L 5 M6 2 Tn BL

EE 241 SPRING 2004 5 discussed for the SRAM cell, however, in this case there are four transistors that must be carefully sized rather than just one. containing a large number of transistors and a good heuristic would be required. V DD M2 M4 Q Q A M4 B V DD M5 C M6 CLK S M6 M5 M1 M3 M8 M7 CLK R M1 A M2 B C 1 C L Figure 7: Static SR flip-flop C M3 C 2 D. Tapered Stacks Tapered stacks in circuits such as the NAND3 depicted in arise from sizing each transistor in a stack differently based on the load that the gate is driving. This can be useful in the case where the early arriving input is seen on the largest gate thereby minimizing load on gates with late arriving signals. For example, if input C arrives early on M3 and A arrives late on M1, then progressively sizing the transistors, with the smallest size transistor closer to the output, can help reduce delay by more than 20% [11]. However, these gains diminish as technology shrinks. There is a significant layout penalty for tapered stacks as diffusion sharing is no longer possible for stacked transistors and one must be careful in sizing the transistors to avoid loading the inputs too heavily. Device-width quantization can have an adverse affect on tapering stacks as the designer no longer has the flexibility to choose a constant tapering ratio as each transistor size must be a multiple of fin height. This would result in reduced gains in performance. V. WIDTH QUANTIZATION ALGORITHM One of the key issues in dealing with device-width quantization is mapping the optimal sizes for a design to a set of discrete sizes that are based on the fin height. Discrete optimization is known to be a hard problem in computer science literature [13]. Popular solutions include relaxation-based optimization followed by heuristics such as randomized rounding or approximation algorithms to map to discrete values. Unfortunately, creating a good heuristic for determining the optimum transistor widths that are discrete multiples of fin height is beyond the scope of this project. Our approach is simple. We enumerate all the possible sizes for all combinations of critical transistors and pick the sizes which result in the best performance-power tradeoff. This is a greedy and compute-intensive approach. It is formally known as branch-and-bound. We can do this efficiently for our set of circuits as the number of critical transistors that need to be sized carefully is small. This approach is impractical for circuits Figure 8: NAND3 with tapered stack VI. ANALYSIS METHODOLOGY Our analysis methodology consists of finding the optimum sizes for each transistor in each circuit for minimum delay and then mapping these sizes to multiples of device-width quanta such that we choose the size that gives us the best performance. Reliable operation of each circuit must also be guaranteed. We employ a branch-and-bound technique mentioned earlier to determine the best width for each transistor. An evaluation of the impact of each choice is determined based on noise margins, delay, and power. A combination of HSPICE and Matlab is used to evaluate each design choice. The Matlab scripts that we will create are a variant of the tool described in [6] that optimizes sizes for logic gates in a combinational gate-level netlist. Our scripts will take as input a transistor-level netlist and appropriate models for NMOS and PMOS transistors, and determine the best sizing for each transistor based on energy-delay tradeoffs. Once the optimum sizing is determined in the continuous domain, a simple algorithm is used to map these sizes to optimum discrete sizes; the choice of discrete size is based on energy-delay metrics. We propose to use models for a 90nm technology for which the fin height is 140nm [3]. HSPICE will be used to determine the stability of each circuit and evaluate the effect on noise margins when device width is varied in discrete increments. We would require HSPICE models for FinFET technology at the 90nm to complete our study. At the present, we are still investigating whether such models will be available for our use. A. Metrics The deviation in performance (power, delay, and signal margins) is quantified by comparing the energy and delay at the optimum design point in the continuous sizing domain versus the energy and delay obtained at the optimum point in the

EE 241 SPRING 2004 6 discrete domain. This tradeoff can be quantified using the energy-delay product (EDP). The effect on reliable operation of each circuit is measured in terms of noise margins for the inverter network; and the maximum allowable voltage ripple, v, at Q and Q for both the SRAM and flip-flop cells. VII. SUMMARY Device-width quantization has recently been identified as a possible technology disrupter for FinFET technology. Prior work shows that its effect on latch operation is negligible with a 8% impact on performance. Tri-gate structures and device orientation optimization are possible processing- and device-level solutions; however, these incur costs in processing and layout. Our study employs Matlab and HSPICE to evaluate the effect of device-width quantization on sizing of three different circuit types: inverter network, SRAM cell, and SR flip-flop. The differences between sizing in a continuous and discrete domains is quantified using the energy-delay product and effects on noise margins. REFERENCES [1] E. J. Nowak et. al., Scaling Beyond the 65nm Node with FinFET-DGCMOS, in Proceedings of IEEE Custom Integrated Circuits Conference, 2003, pp. 339-342. [2] K. Bernstein, C.-T. Chuang, R. Joshi, R. Puri, Design and CAD Challenges in Sub-90nm CMOS Technologies, in Proceedings of IEEE/ACM International Conference on Computer-Aided Design (ICCAD), Nov. 9-13, 2003, pp. 129-136. [3] E. J. Nowak et. al., Turning Silicon On Its Edge, IEEE Circuits and Devices Magazine, 20(1):20-31, Jan-Feb 2004. [4] T. Ludwig et. al., FinFET Technology For Future Microprocessors, in Proceedings of IEEE International SOI Conference, 2003, pp. 33-34. [5] H. Qin, T. Ludwig, K. Bernstein, J. Rabaey, E. Nowak, The Impact of Width Granularity on FinFET Latch Operation and Optimization, submitted to 2004 Symposium on VLSI Circuits. [6] E. J. Nowak et. al., A Functional FinFET-DGCMOS SRAM Cell, in IEDM Technical Digest, 2002, pp. 411-414. [7] R. Zlatanovici and B. Nikolić, Power-Performance Optimal 64-bit Carry-Lookahead Adders, in Proceedings of European Solid State Circuits Conference (ESSCIRC), 2003, pp. 321-324. [8] Leland Chang et. al, Moore s Law lives on, IEEE Circuits and Devices Magazine, Jan. 2003, pp. 35-42 [9] B. Doyle et. al., Tri-Gate Fully Depleted CMOS Transistors: Fabrication, Design and Layout, in Symp. On VLSI Technology, 2002, pp. 133-34 [10] Y.-K. Choi, T.-J. King and C. Hu, A spacer patterning technology for nanoscale CMOS, IEEE Trans. Elec. Dev., vol. 49, no. 3, pp. 436-441, March 2002. [11] J. M. Rabaey, A. Chandrakasan, B. Nikolic, Digital Integrated Cricuits: A Design Perspective, Second Edition, Prentice-Hall, New Jersey, 2003. [12] L. Chang, Nanoscale Thin-body CMOS Devices, PhD Thesis, University of California, Berkeley, Dept. of Electrical Engineering and Computer Science, 2003. [13] T. H. Cormen, C. E. Leiserson, R. L. Rivest, C. Stein, Introduction to Algorithms, Second Edition, McGraw-Hill Book Company/MIT Press, Cambridge, MA, 2001. [14] S. Hassoun, and T. Sasao, (Eds.), Logic Synthesis and Verification, Kluwer Academic Publishing, Boston, MA, 2002.