Flip-Flop Insertion with Shifted-Phase Clocks for FPGA Power Reduction

Size: px
Start display at page:

Download "Flip-Flop Insertion with Shifted-Phase Clocks for FPGA Power Reduction"

Transcription

1 Flip-Flop Insertion with Shifted-Phase Clocks for FPGA Power Reduction Hyeonmin Lim, Kyungsoo Lee, Youngjin Cho and Naehyuck Chang School of CSE, Seoul National University, Korea Abstract Although the (look-up table) size of FPGAs has been optimized for general applications, complicated designs may contain a large number of cascaded s between flip-flops. This results in unwanted glitch propagation along the s, and wastes power. This paper proposes a flip-flop insertion, We propose insertion of new flip-flops between adjacent existing flipflops to minimize glitch propagation and power loss. Each new flip-flop is timed by a phase-shifted clock with the phase calculated from the delays of s and routing paths. This is different from traditional retiming methods that use the original clock or an 180-degree clock for the new flip-flops, and thus alters the original pipeline structure and synchronization. We start from a post-layout design, retiming its clock frequency and timing behavior. Multiple flip-flop insertion is an NP-complete problem because each new flip-flop affects the delays in the design. We have devised a glitch generation and propagation model for based FPGAs that take account of path delays while supporting reasonable complexity. We propose effective heuristics for flipflop insertion and clock phase selection. Full-chip measurements, including all the overheads associated with the inserted flip-flops, show that our approach shows up to 38% of the total dynamic power. We have analyzed our scheme, showing the mechanics of clock assignment and glitch minimization, and the sources of power reduction. I. INTRODUCTION FPGAs (Field Programmable Gate Arrays) are generally more expensive, and have smaller capacity and lower performance than other target devices that require fabrication. Despite these drawbacks, FPGA implementations are often retained in final products, thanks to their short time-to-market, low financial risk, and the ease of making design changes. However, as the density of FPGAs increases, their power consumption becomes problematic. Even look-up table () based FPGAs, which are more frugal than other architectures, consume significant amount of power. FPGA vendors offer various sizes of, but an FPGA is farely optimal for a particular design. In fact, complicated designs generally contain a large number of cascades with two or more levels. This results in unwanted glitch propagation down the cascaded s until a flip-flop terminates signal propagation. This causes a serious waste of power unless a well-designed glitch blocking method is applied. In this paper, we introduce a power reduction technique for -based FPGAs that blocks glitch propagation with inserted flip-flops. Reducing glitch propagation by inserting flip-flops is an established technique, but our approach to clock assignment is new. One of the most distinctive characteristics of -based FPGAs is interconnect their delay. Like deep sub-micron designs, -based FPGAs exhibit path delays due to the high resistance and capacitance of the routing resources. Glitches may contribute % to 70% of the total power consumption of ASICs (application specific integrated circuits) [1] [2]. Power consumption due to glitches is even more serious for FPGAs. Interconnect resources dissipate at least 65% of total power in the Xilinx 4003 family, and more than 60% in the Xilinx Virtex-II family [3] [4]. But glitch blocking, using a combination of pipelining and retiming, can reduce the amount of energy consumed per operation by between 40% and 90% [5]. We will focus on the glitches that have the highest impact on power consumption. Pipelining or retiming, or both, involves the insertion of flip-flops between combinational logic blocks, this reduces glitch propagation effectively, but in ASICs it requires more area for additional flip-flops. Dividing each flip-flop into two latches timed by dual 180-degree phase-shifted clocks, is one way to reduce the area requirement [6]. Although retiming by fixed-phase clock assignment is straightforward, it results in architectural modification and altered time behavior. Consequent changes to factors, such as pipeline depth, latency and synchronization with other pipelines, may require the entire design to be revisited. Our approach is completely different from traditional retiming techniques: We reactivate disabled flip-flops in logic blocks that comprise cascaded s in large-scale combinational logic, clocking them with phase-shifted clocks, rather than the original (in-phase) clock. We determine the amount of phase shift by considering the delays in s and routing paths. Since the resolution of the phase shift and the number of dedicated clock lines are both limited, we insert flipflops in an order that maximizes the benefit from glitch blocking, in case the process has to be terminated because clock resources are exhausted. Since the slack in a net is dependent on the slack of other nets, flip-flop insertion is an NP-complete problem. We have proved its NP-completeness and devised an appropriate heuristic solution. Unless a cascaded is not on a critical path, we can exploit its slack. Thus, even though the delays of two signal paths are not identical, the inserted flip-flops may share the same phase clock. Our heuristics aim at the maximum power saving by appropriate grouping of the inserted flip-flops. We formulate a glitch generation and propagation model which can take account of routing path delays while supporting reasonable design complexity. -based logic has different glitch generation and propagation characteristics to ASICs. Our model of takes the SRAM structure into account. By the addition of power consumption models, we have built an FPGA dynamic power simulator for flip-flop insertion. Our approach considers all the overheads of flip-flop insertion, taking account of the switching capacitance of each flip-flop and its dedicated clock path. Furthermore, the final result is a real full-chip measurement, and thus shows the actual power saving, including every overhead.

2 FF 1 L 2 (a) One-level (no glitch propagation) (b) Two-level (glitch propagation) FF 1 FF 1 L 2 FF 1 L 2 φ 1 (c) Glitch block w/in-phase clock (d) Glitch block w/shifted-phase clock Fig. 1. Glitch minimization by insertion of unused flip-flops. (a) (b) (c) (d) FF 1 φ 1 φ 1 L 2 L 2 slack FF 1 FF 1 Fig. 2. Timing diagram of (a), (b), (c), and (d) in Figure 1. Measurements are backed up by a complete analysis of the clock assignment process, the extent of glitch reduction and of dynamic power reduction. The results of this analysis accord with the measurement results, and provide further justification of the effectiveness of the proposed scheme. Our power-reduction scheme starts with a post-layout design,and retains the original clock frequency and timing behavior. Because our approach preserves timing and logical behavior, we can apply it in tandem with other FPGA powerreduction techniques [7] [8]. Because we are using FPGAs and net ASICs, we do not have to sacrifice any area for flip-flop insertion; we can simply activate unused flip-flops [2]. Phase shifting of the original clock and additional dedicated clock lines are already available as built-in functions on FP- GAs, and require no additional silicon area. The Virtex-II architecture from Xilinx Inc. provides a logic which can shift the phase in steps of 50ps or 1/6 of the input clock period [9]. FPGAs from Altera can also shift the phase of the input clock using PLLs [10]. This functionality makes it possible to generate phase-shifted clocks without either using additional logic blocks on the FPGA, or off-chip support. We used the Virtex-II architecture as our target device, and the Xilinx FPGA design tool functions such as synthesis, timing L 2 simulation, placements, routing delays. The rest of this paper is paganized follows. We clarify the problem in Section II. Section III introduces a -based FPGA energy model, constructed in terms of effective capacitance and switching activity. Based on this model, we propose an effective phase-shifted clock assignment algorithm, called Solve-FPA, in Section IV. The experimental results are showninsectionv. II. PROBLEM STATEMENT A. Concept of glitch blocking Glitches in -based FPGAs can be blocked by activating unused flip-flops without performance degradation by the use of a phase-shifted clock. In Fig. 1(a), a glitch generated from is blocked by FF 1 ; thus there is no additional power consumption due to glitch propagation in this one-level. In Fig. 1(b), a glitch generated by propagates to L 2 and causes additional power consumption in the routing path from to L 2 and in L 2. By activating an unused flip-flop, FF 1,we can stop glitch propagation from to L 2, we see in Fig. 1(c). Note that FF 1 is triggered by the system clock, (i.e. the inphase clock ). Thus insertion of FF 1, clocked by, causes a change in system behavior. The timing diagram of Fig. 1 is shown in Fig. 2. In Figs. 2(a) and 2(b), evaluation of and L 2 is finished in one clock cycle. However, as shown in Fig. 2(c), a traditional approach would require an additional clock cycle to finish the evaluation of L 2. B. Shifted-phase clock approach Now we will use the slack time before the new flip-flop is inserted to negate the change in system behavior. A slack time exists between flip-flops which are not on the critical path, or in all flip-flops when the design is not operated at the maximum clock speed. In Fig. 2(b), the total delay of the two s, and L 2, is shorter than the clock cycle of, which means that the input of will arrive sooner than the triggering edge of. We can use this slack time to insert the flip-flop, without an additional clock cycle, by triggering the new flip-flop with a new phase-shifted clock φ 1.Fig.1(d) shows the insertion of FF 1, to be triggered by the shiftedphase clock φ 1. The phase of the new clock is chosen to match the logic block and interconnect delay, and thus the original system behavior is restored. Fig. 2(d) shows now the slack time determines the phase of φ 1.When is executed as soon as possible and L 2 is executed as late as possible, the slack time between and L 2 is maximized. Since we are considering rising-edge flipflops, the allowable phase of the clock φ 1 that triggers FF 1 is bounded by the amount of slack time between and L 2.Note that the width of φ 1 is slightly less than slack time because of the setup and hold times of the flip-flop. Moreover, we cannot insert a flip-flop on the critical path without degrading the maximum operating frequency of the original design, because of the setup/hold times of the flip-flop. Comparing Fig. 2(d) with Fig. 2(b), we can minimize glitch propagation by triggering the newly inserted flip-flop with φ 1 without changing the system behavior. C. Dynamic power saving After an unused flip-flop is activated, glitch propagation is blocked, and thus dynamic power is saved. Since the two major elements of dynamic power consumption are switching activity and load capacitance, we compute the number of glitches blocked by the inserted flip-flop and the

3 total load capacitance influenced by each blocked glitch. The dynamic power saved by activating a flip-flop is obtained by multiplying the number of blocked glitches by the total effective capacitance of each. D. Overhead of flip-flop insertion While new flip-flops save power by blocking glitch propagation, they also dissipate additional power. After inserting a flip-flop, the glitch which would otherwise have been propagated now hits the new flip-flop. the resulting power overhead is the product of the total number of blocked glitches and the capacitance of the inserted flip-flop. Using the phase-shifted clock is also a power overhead. Since the clock in an FPGA uses a global net which is routed through the entire chip, the effective capacitance of the global wiring required for a new clock cannot be neglected. E. Flip-flop insertion as an optimization problem Given an FPGA design after placement and routing, we need an assignment of the phase-shifted clocks mat deploys unused flip-flops for maximum power saving. We call this the flip-flop and phase-shifted clock assignment (FPA) problem. An instance of this problem is characterized by (D,n c,r p ) where D is the FPGA design, n c is the available number of new clocks and r p is the reciprocal of the phase shift resolution. The factors n c and r p are dependent on the target FPGA architecture. Ideally, we would achive maximum glitch blocking by inserting flip-flops wherever s are directly connected to each other. Each newly inserted flip-flop would be clocked by an appropriate phase-shifted clock, reflecting the delay calculations. In reality this is not feasible due to the limitations of the phase-shifted clocks. The number of phase-shifted clocks is limited by the clock lines in the FPGA and the phases of the clocks are limited by the resolution of the PLLs or DLLs. As a result, we have to choose the locations for flip-flop insertion, and the corresponding clock phases carefully, so as to achieve the maximum power saving by glitch blocking. This problem can be summarized as follows: Problem 1: FPA: Let us assume that the target FPGA has clock lines such that c =(c 1,,c nc ) and PLL/DLL phase shift slots such that p =(p 1,, p rp ).Foragiven(D,n c,r p ), determine the phase selection f p : c i p j,where1 i n c and 1 j r p, such that f p results in the maximum power saving without changing the logical behavior of D. Once the FPA problem has been solved, flip-flop insertion becomes straightforward using existing resources. We can activate unused flip-flops using the clock lines. III. POWER CONSUMPTION OF LOOK-UP TABLE BASED FPGAS A. The Virtex-II architecture Virtex-II devices are user-programmable gate arrays consisting of input/output blocks (IOBs) and configurable logic blocks (CLBs). CLBs consist of four slices, which include two 4-input function generators, carry logic, arithmetic logic gates, wide function multiplexors and two storage resources. Each 4-input function generator is configurable as a 4-input, as 16-bit shift register element, or as a 16-bit distributed SelectRAM memory. The output from the function generator in each slice drives both the slice output and the D input of the storage element. The storage resources are configurable as either edge-triggered D-type flip-flops or as level-sensitive latches. We assume that Type Resource Capacitance (pf) Input Crossbar 9.44 Interconnect per CLB Output Crossbar Double Hex Long Logic per CLB inputs Flip-flop inputs Clocking Global Local TABLE I THE EFFECTIVE CAPACITANCE OF EACH RESOURCE IN THE VIRTEX-II 1000-FG456. function generators and storage resources are fixed as 4-input s and D flip-flops, respectively. All routing resources are segmented into long, hex and double lines. Long lines are bidirectional wires that distribute signals across the device. Vertical and horizontal long lines span the full height and width of the device. Hex lines route signals to every third or sixth block in all four directions. Double lines route signals to every first or second block in all four directions. In addition, there are two sets of switches to connect the wire segments to the input and output of each CLB, which are the input and output crossbars. The digital clock manager (DCM) is a logic block for clock management in the Virtex-II architecture. While its main features are clock de-skew, phase adjustment, and frequency synthesis, we can utilize it effectively for phase shifting of the system clock. The DCM can create a phaseshifted output with a resolution of 50ps or 1/6th of the input clock period. The number of DCMs, which is the number of available new clocks, is dependent on the device, but the minimum is four [9]. B. Effective capacitance Dynamic power dissipation is caused by signal transitions in the circuit, and is comprised of two parts: i) charge and discharge of capacitance; and ii) short-circuit power. The dynamic power consumption caused by charge-discharge of capacitance in a CMOS circuit is given by Power X C x V 2 dd f x, (1) where C x is the load capacitance of the net x, V dd is the supply voltage, and f x is the transition rate of x. Since short-circuit power dissipation is mostly caused by switching, it can be modeled as an additional capacitance in Eq. 1. Thus the effective capacitance is defined as the sum of parasitic effects due to interconnection wires and transistors, and the emulated capacitance due to short-circuit currents [4]. The effective capacitance of each resource is shown in Table I. The effective capacitances of long and global wires are dependent on the size of the target FPGA, and we choose values (from [4]) that are consistent with the Virtex-II 1000-FG456. C. Switching activity We measure switching activity using the concept of transition density. The transition density of a net x, D(x), isdefined as lim T n x (T)/T,wheren x (T ) is the number of transitions of x in the time interval ( T/2, +T/2] [11]. To compute the exact power saved by flip-flop blocking, we need to know the number of transitions which are glitches. To achieve this, we compute the transition density with and

4 without considering the delay, and take difference between these two values. First, without considering the delay: when the time interval of the transition density is a clock period, we model the transition density as the probability of a transition in a clock. The Boolean difference, y/ x i is have used to compute the propagation rate of the transition from the input to the output of the logic block [11]. The probability of a Boolean difference P( y/ x i ) is calculated by dividing y/ x i by 2 n, which is the total number of input patterns. We can extend the Boolean difference to accommodate multiple input transitions as, y (x i x j ) = y xi =1,x j =1 y x i =0,x j =0 + y xi =1,x j =0 y x i =0,x j =1, (2) and derive the transition density of the output y: «y D(y)=P D(x 1 ) (1 D(x 2 )) (1 D(x 3 )) (1 D(x n )) x 1 «y +P (1 D(x 1 )) D(x 2 ) (1 D(x 3 )) (1 D(x n )) x 2 + «y +P D(x 1 ) D(x 2 ) (1 D(x 3 )) (1 D(x n )) (x 1 x 2 ) + «y +P D(x 1 ) D(x 2 ) D(x 3 ) D(x n ). (x 1 x 2 x n ) (3) The first term of this equation is the probability that the transition occurs only at the input x 1. We sum these probability over all of the input patterns, assuming no input correlation. The glitch propagation characteristics of LUSs are different from those of ASICs, and we need to take SRAM structure of s into account. In a, implemented as an SRAM, the difference in input delays causes skew between input signals. this is shown for a 4-input in Fig. 3, and here the number of transitions for all 16 cases is y/ x 1 + y/ x 2 + y/ x 4 if the delays are considered, or y/ (x 1 x 2 x 4 ) if not. Note that these values are independent of the sequence of inputs because the Boolean difference represents the number of transitions for every input pattern. We can now compute the transition density, considering the delay of output y, by substituting y/ (x i x j x k ) for y/ x i + y/ x j + + y/ x k in Eq. 3. We define D (y) as the transition density of y with delay, and it is given by nx «! D y (y)=min P D (x i ), τ/µ. (4) x i=1 i Because the width of two transitions less than the logic delay of the gate is eliminated, we assign τ/µ as the maximum transition density, where τ is the clock period and µ is the delay of the, assuming that the transitions are equally spacial. This assumption may slightly overestimate the glitch transitions, but it does not cause appreciable error because we use only the relative values of the transition density in the algorithm to be descended in Section IV. We have checked the extend of this overestimation by perturbing the value of τ/µ. Note that D (x) is equal to D(x) when x is the input to the logic. Although Najm s model [11] looks similar to Eq. 4, it is not capable of pulse absorption; a pulse is eliminated when it x 1 x 2 x 1 x 2 x 3 x x 3 x (a) Without delay (no input skew) (b) With delay (input skew) Fig. 3. The number of transitions in y for the transition x 1,x 2,x 4 in a 4-input. Solve-FPA input : FPGA design D, c =(c 1,,c nc ), p =(p 1,, p rp ) output : power-minimized FPGA design D new begin get graph G from FPGA design D for each new clock c i get weight w and interval ξ for each vertex v in G for each phase p j get maximum phase weight mpw(p j ) endfor select p max with maximum mpw(p j ) if mpw(p max ) overhead break assign p max to c i modify G with c i endfor get D new from G return D new end Fig. 4. A summary of Solve-FPA. passes an if it is narrower than the delay of the s. This results in significant overestimation of the transition density. Another experiment [12],in which Najm s model was compared with IRSIM, a switch-level simulator supports this fact. Anderson and Najm [13] estimate the activity of the net in an FPGA by regression analysis. Although this method is more accurate, long computation times and a complicated procedure militate against its use in a fully automated algorithm. Using D(y) and D (y), we can compute the number of transitions which are glitches. We define G trans (y), as the glitch transition in y. G trans can be computed as G trans (y)=d (y) D(y). (5) There are also arithmetic logic gates and multiplexers in the slices of the Virtex-II. We apply the same model to these dedicated logic blocks as we use for the s. IV. GLITCH MINIMIZATION TECHNIQUES FOR LOOK-UP TABLE BASED FPGAS A. Proposed algorithm Addressing the FPA problem from scratch, we see that activating flip-flops with a shifted-phase clock must yield the maximum power saving by glitch blocking (called the weight in this paper). To achieve this objective, we consider the weights of all the unused flip-flops. Note that more than two flip-flops can be activated by a single phase-shifted clock if the phase p of that clock is in the range of those flip-flops. Thus we need to know the maximum weight that can be obtained for each possible phase. We show that this problem is NP-hard, and address it with a greedy algorithm. Because

5 activating a flip-flop changes the weights and timing values of the downstream components, computing optimal phases for all the available clocks is not a finite problem. Therefore, we set the phase of one clock at a time. Once we have activated flip-flops with a particular clock, we recompute the weight and timing values and repeat the procedure for the next clock until there remains no clock to insert or no further power saving is produced. We call the resulting power minimization phase-shifted clock assignment algorithm Solve-FPA (Fig. 4). e 1 v 1 v 3 e 2 e 4 e 3 e 5 e 6e7 v 5 v 6 v 7 e 8 e 9 v 9 v 10 e 10 NCD Xilinx design tool (4.2) v 2 v 4 Fig. 6. DAG generated from an FPGA design. v 8 v 11 Fig. 5. Directed acyclic graph XDL Graph generation (4.2) Weight generation (4.3) Slack calculation (4.4) Phase assignment (4.5) Power saving analysis (4.6) Design conversion (4.6) New NCD SDF Weighted slack interval graph Overall design flow of Solve-FPA. The overall procedure of Solve-FPA, shown in Fig. 5, is fully automated. The algorithm starts from an original FPGA design in a post-layout net list format. The FPGA design is converted to a directed acyclic graph (DAG) at the graph generation stage. Then the timing simulation stage extracts delay information from the FPGA design using the Xilinx design tool. Next, we compute the extent of the power saving: this is the weight generation stage. Depending on the cumulative delay of the flip-flops and nets in front of and behind the new flip-flop, we calculate the maximum permissible range of this phase in the slack calculation stage. Then we select a phaseshifted clock that will give the maximum power saving, using a vertex-weighted DAG with slack times: this is the phase assignment stage. Then we decide whether to iterate or not, while considering the clock insertion overhead, at the power saving analysis stage. Finally, the new FPGA design is generated at the FPGA design conversion stage by converting the modified graph to the FPGA net list format. B. Graph generation and timing simulation The first step of the algorithm uses the Xilinx FPGA design tool to generate the FPGA design in a binary file, in NCD format. We translate it into the Xilinx Description Language (XDL), a text format, for flexibility in editing the design. Timing simulation is also performed using the Xilinx design tool, which generates outputs in Standard Delay Format (SDF). Using the XDL file, we convert the FPGA design to a DAG. Each vertex of the DAG is associated with an and its following flip-flop, or only with an if the flip-flop is not used. The nets are then converted into edges (Fig. 6). We will now define name additional terms. For every u,v V, if there is an edge u v, thenu is the predecessor of v, and v is the successor of u. Ifthereisapathu v, thenu is the ancestor of v, andv the descendant of u. C. Weight generation The weight of a vertex denotes the possible extent of power saving when the unused flip-flop at that vertex is activated. This step requires the effective capacitance of each vertex to be determined. The effective capacitance of a vertex v, C ef f (v), is defined as the sum of the effective capacitances of the resources used in the output edges of v. We compute C ef f (v) by summing the products of the number of resources of each type used with their unit effective capacitances. The effective capacitance of the various resources is shown in Table I. When a glitch is not blocked at vertex v, it will propagate to the descendants of v until it meets the in-phase flip-flop. A glitch first generates a power leakage from v, and power leakage also occurs in its predecessors. But the extent of a glitch will be reduced by a factor of y/ x, wherex and y are the input and output edges of u, respectively. In other words, we can say that the effective capacitance is reduced by a factor of y/ x from the viewpoint of v. We therefore define the total capacitance, C total (v), as the total effective capacitance of the circuit under the impact of G trans (v). We can compute C total (v) as C total (v)=c eff (v)+ X «y C total (u) P, (6) vu u f anout(v) where f anout(v) is the set of vertices which comprise the fanout of v, vu is the edge v u and y is the output of u. Finally, the weight of v is given by w(v)=g trans (v) C total (v). (7) Since the supply voltage is constant in Eq. 1, this definition of weight is sufficient to represent the dynamic power reduction achieved by activating the flip-flop at v. In summary, the output of the weight generation stage is a vertex-weighted DAG G = (V,E,w). D. Slack calculation The slack calculation determines the amount of freedom available to each newly inserted flip-flop, which is the permissible range of phases. It utilizes the DAG G =(V,E,w) and the delay information from the SDF file. Within the permissible range, we convert G to the slack interval graph G int,whichis a weighted interval graph and takes the precedence relations among the vertices into consideration. We define the slack function ξ : v (R,R) which maps avertexv to two real numbers: a starting-point and an endpoint. These are calculated under the condition that the all the

6 v 3 v 1 v 4 v 2 w(v 3 ) v w(v 9 9 ) v w(v 5 5 ) v w(v 7 ) 7 v 10 v w(v 6 6 ) w(v 4 ) v 8 w(v 8 ) a b c d e Fig. 7. Weighted slack interval graph converted from a DAG. v 11 v 2 w(v 2 ) w(v 1 ) v 1 v 1 v 3 w(v 3 ) w(v 4 ) v 4 E = {(v 1,v 2 ), (v 2,v 3 ), (v 1,v 3 ), (v 1,v 4 )} v 2 v 3 v 4 ξ s w(v 1 ) w(v 2 ) w(v 3 ) w(v 4 ) p ξ e Ψ = {ψ(v 1,v 2 ), ψ(v 2,v 3 ), ψ(v 1,v 3 ), ψ(v 1,v 4 )} ancestors of v are executed as early as possible and all the descendants of v are executed as late as possible. For instance, let the cumulative delay in front of the new flip-flop be τ f, and let the delay behind the flip-flop be τ b.if the clock period is τ, andτ > τ f + τ b, the permissible phase range will be [2πτ f /τ,2πτ b /τ], which becomes the slack available to the new flip-flop. We convert G =(V,E,w) to G int =(V int,w int,ξ,ψ) by translating each vertex to a slack interval, V int, with starting and end-points determined by ξ(v int ). The weight of a slack interval, w int (v int ),ing int is inherited from the weight of v in G, i.e. w int (v int )=w(v). Ψ is a set of links among slack intervals such that u int and v int are linked if and only if there is a path between v and u in the original DAG G. Fig. 7 demonstrates a slack interval graph derived from Fig. 6, where the letters a to e denote the available phases. E. Phase assignment Phase assignment selects the shifted-phase clock that gives the maximum benefit. A slack interval graph G int = (V int,w int,ξ,ψ) is the input to the phase assignment stage and (G int, f p ) is the output, where f p is the clock selection function defined in the FPA problem. For every possible phase p, thereisasetv p of slack intervals which can be triggered by a p-shifted clock, so that v int V int, v int V p if and only if the phase p is between the starting and end-points of ξ(v int ). We call V p the p-crossing set. We define the maximum benefit of selecting the phase p as the maximum phase weight, mpw(p). If there were no link in G int, i.e. Ψ = /0, we could obtain mpw(p) by simply summing the weights of the slack intervals in V p. However, finding mpw(p) is difficult, due to the links within the set of vertices V int.ifψ(u int,v int ) Ψ, and even though u int,v int V p, u int and v int cannot simultaneously be related to correspond to p without changing the system behavior. Therefore, the links between slack intervals become constraints in computing mpw(p). For example, if Ψ in Fig. 7 ware an empty set, then mpw(d) would be w(v 9 )+w(v 7 )+w(v 6 )+w(v 8 ).HoweverΨ is not empty, and so mpw(d) is actually MAX(w(v 6 ),w(v 7 )+ w(v 8 ),w(v 9 )+w(v 8 )). We call the optimization that finds mpw(p) with a given G int and phase p the maximum phase weight (MPW) problem, and we claim that it is NP-hard. Problem 2: MPW decision: Let us assume that a graph G int, a phase, (G int, p), and a given positive real number k constitute an instance of the MPW problem. For a given V p, we now seek to determine whether mpw(p) k. Theorem 1: The MPW decision problem is NP-hard. Proof: We show that the MPW decision can be reduced int polynomial time to a maximum weighted independent set problem, which has been shown to be NP-hard [14]. Let us assume that we have a vertex-weighted graph G =(V,E,w). First we convert G to an interval graph G int = (a) Vertex-weighted graph G (b) Weighted interval graph G int Fig. 8. Conversion between vertex-weighted graph G and weighted interval graph G int. (V int,w int,ξ,ψ), such that ξ : v (ξ s,ξ e ),whereξ s and ξ e are positive real numbers and ξ s < ξ e (Fig. 8). V now is translated to the slack interval V int, with starting and endpoints determined by the function ξ. Because the range of all the intervals is [ξ s,ξ e ], there exists a phase ξ s p ξ e such that all the intervals are p-crossing, i.e. V int = V p. Ψ is obtained from the edges E from G such that, for all u,v V, when u and v are adjacent, the corresponding intervals u int and v int are linked, i.e. ψ(u int,v int ) Ψ. Finally, w(v) is converted to w int (v int ). Therefore, if a vertex set in G is an independent set, the corresponding interval set in G int has no links to it. Therefore, an independent set in G has a sum of weights which is not less than k, if and only if there exists V p V int,wherempw(p) k. Having shown the MPW problem to be NP-hard, we apply a greedy algorithm to solve it. In cool data, we found that a small number of vertices in V p usually have dominant weights. To capitalize on this, we sort the vertices before computing mpw(p). Then, we select the vertex with the maximum weight and remove all the vertices which are linked to this vertex. We repeat this process until the set V p is empty. F. Power saving analysis and FPGA design conversion The sequence of procedures, from weight generation to phase assignment, which we have just descended, determine the shifted-phase clock which maximizes the power saved by glitch blocking. We iterate through these procedures until we have used all the n c clock resources, or until the insertion of more flip-flops does not save any more power, because of the overheads, even though some clock resources remain unused. The power overhead of an inserted flip-flop is made up of the power consumption of the new flip-flop and that of the global wires used by the new clock. Power consumption by flip-flop can be computed by multiplying G gen (v) by the input capacitance of the flip-flop (Table I), and the overhead from clock routing can be computed by multiplying the effective capacitance of both global and local clock routing resources by the transition density of the clock, which is 1. We can then compare mpw(p) with the sum of the two overheads, and insert a phase-shifted clock only when the total overhead is smaller than mpw(p). Unfortunately, inserting a new flip-flop may change the weights and the slack intervals of the design. To achieve a global optimum, phase assignment of all the additional clocks would need to be done simultaneously, but this implies an unacceptably complicated calculation. Instead, we perform flip-flop insertion and phase assignment sequentially, i.e. in a greedy manner, which makes the optimization problem converge to a local minimum. Experimental results and actual measurements of practical benchmark sets show that the

7 Fig. 9. Dynamic energy(nj/clock) Test environment of FPGA cycle-true energy measurement Clock cycle Original 1st-phase FF insertion 2nd-phase FF insertion 3rd-phase FF insertion 4th-phase FF insertion 5th-phase FF insertion 6th-phase FF insertion Fig. 10. Dynamic energy reduction using Solve-FPA (C6288). greedy approach is effective in blocking glitch propagation and does yield significant power saving. After selecting the clock phases and flip-flops that save the most power, we modify the XDL file to trigger the flip-flops with the phase-shifted clock that we have chosen. We then convert the modified XDL file to an NCD file, which is compatible with the Xilinx FPGA design tool. V. EXPERIMENTS To reinforce the simulation results, we measured a real Xilinx FPGA device: a Virtex-II 1000-FG456 with 51 slices and a core voltage of 1.5V, using out in-house FPGA cycletrue energy measurement tool based on switched capacitors [], which is shown in Fig. 9. Fig. 10 shows the difference in cycle-true energy consumption between the original C6288 and the C6288 logic modified by Solve-FPA. It demonstrates that Solve-FPA does achieve more dynamic energy reduction by adding groups of flip-flops triggered by shifted-phase clocks. Fig. 11 shows how the dynamic energy distribution caused by a glitch changes as flipflops are inserted. Because the addition of shifted-phase clocks incurs an energy overhead, the extent of the energy saving decreases as more phases are used. Depending on the circuit structure, the optimal number of phases for the benchmark circuits varies from one to six (Fig. 12). The greatest sawing is achieved on C6288, using six phases to achieve a 31% reduction in dynamic energy consumption. More detailed testbench results are summarized in Table II. In most of the benchmark circuit Solve-FPA only achieves significant dynamic power reduction of the most benchmark circuits only with the first-phase of flip-flop insertion. The test cases are of three classes: Xilinx core generators, ISCAS85, and an FIR filter. The Addr14 and Multiplier14 cases shown on the first two rows of Table II are a 14-bit adder and Energy (pj/clock) Energy (pj/clock) Energy (pj/clock) Phase (Degree) (a) Original design Energy (pj/clock) Phase (Degree) (b) Third-phase flip-flop insertion Fig Phase (Degree) (b) First-phase flip-flop insertion Phase (Degree) (b) Sixth-phase flip-flop insertion Histograms of net dynamic energy caused by a glitch in C6288 logic, with a sequence of modification by Solve-FPA. multiplier generated by the Xilinx core generator. The next three are ISCAS85 circuits, which are ASIC benchmarks implemented as s after optimization. In these cases it is less meaningful to apply our algorithm, and therefore we have omitted the results for those circuits. The final circuit is a directed FIR filter with 16 pipeline stages. Unlike other combination circuits, in this filter flip-flops are used for puposes other than feedback. We used real input vectors to the FIR filters to make the experiment as realistic as possible, while the ISCAS85 circuit and the adder and multiplier were driven by random vectors. The number of slices used in these design is between % and % of the target FPGA. We have duplicated some circuits that occupy only small number of slices in the interest of more accurate measurement. The second column, on titled module, in Table II indicates the number of duplications. All the benchmarks were implemented on one or two thousand slices, except the 14-bit adder (Addr14) and C499 from IS- CAS85. This is caused by the limited number of input/output signals available on the measurement tool. One of the most important outputs of Solve-FFPA is the phase of the clocks for the newly inserted flip-flops. Note that the values in the fifth column of Table II are not measured in degrees but in units of a 1/6th of a period of the system clock. For example, values of 128 and 6 would correspond to phases of 180 and 360, respectively. The number of candidate nodes are the number of CLBs present before any flip-flop are allocated. The final action of Solve-FPA is to reactivate enough CLB flip-flops to correspond to the selected quantity of nodes. Note that selecting more nodes does not save more energy. Solve-FPA estimates the amount of glitch energy can be saved. The reduced power requirement takes into account the overhead of the additional flip-flops and of the global wires for clock distribution. Since relative energy is more important, these values may have offsets. The large difference between the measured and estimated energy reduction in Addr14 is due to the small size of the glitch power compared to the power consumed by the logical transitions. Even though almost every glitch is removed by clock insertion, the effect on dynamic

8 Target Number Number Number Selected Number of nodes Estimated glitch Measured dynamic energy consumption Circuit of modules of slices of phases phase(s) (selected/candidates) energy reduction Original Solve-FPA Reduction Addr / % 3.69nJ/clk 3.37nJ/clk 8.67% Multiplier / % 14.47nJ/clk 11.48nJ/clk.63% C / % 6.27nJ/clk 6.00nJ/clk 4.31% C / % 5.46nJ/clk 4.72nJ/clk 13.41% C / % 24.89nJ/clk 16.99nJ/clk 31.74% Tap16-D / % 9.03nJ/clk 7.88nJ/clk 12.74% : 116, 83, 5, 98, 187, 57, and : 107, 117, 231. TABLE II POWER REDUCTION ACHIEVED BY Solve-FPA. Dynamic energy (nj/clock) C432 C499 Tap16-D C The number of phases for flip-flop insertion Fig. 12. Dynamic energy saving by flip-flop insertion. The energy curves are convex due to the power overheads of the inserted flipflops. power consumption is not large. As shown in the last three columns of Table II, Solve-FPA actually saves between 8% and % of dynamic energy in real applications. The differences between the computed and measured values result from the assumptions made while modeling the capacitance and transition density of the Virtex-II and from inaccuracies in the measurement tool. in any case, experimental values showed around 5% variation, depending on the temperature of the target device. We consider the calculated and measured realty to be broadly consistent. We also tried slowing down the clock frequency to extend the slack interval of each vertex. When the clock frequency is reduced by 5%, around 2% of additional power reduction was achieved. This shows that more power saving can indeed be obtained by reducing the clock frequency, which can improve the gain from clock scaling for devices which do not allow a scalable supply voltage. VI. CONCLUSIONS We have showed how to save power in -architecture FPGAs, based on the observation that most of the power is dissipated in the routing paths, and specifically, in glitch propagation. We propose the insertion of flip-flops between adjacent s when there are more than two cascaded s. Our method is substantially different from traditional retiming or pipelining schemes, in that the inserted flip-flops are timed by clocks which are phase-shifted to correspond to the delays in s and routing paths, and their slack times. To make the most of limited phase-shift resolutions and clock lines, we insert flip-flops in order of power gain to be gained by glitch minimization. This is shown to be an NP-complete problem, bu we provide an efficient heuristics. The advantages of our method are: 1) we do not change pipeline structures and timing behavior by the use of phase-shifted clocks; 2) we can start from a post-layout design independent of other optimizations during earlier design stages; and 3) flip-flop insertion can easily be automated as a post-optimization process at the final design stage. The figures in this paper allow our glitch minimization techniques to be visualized: We show the distribution of glitches by clock phase, and real-chip cycle-true power measurements. Results show savings of up to 38% of the fullchip power. In future work, we will continue to upgrade the heuristics to solve NP-complete problem of allocating shiftedphase clock flip-flops. REFERENCES [1] A. Shin, A. Gosh, S. Devadas, and K. Keutzer, On average power dissipation and random pattern testability of cmos combinational logic networks, in The Proceedings of the International Conference on Computer Aided Design, 1992, pp [2] M. Favalli and L. Benini, Analysis of glitch power dissipation in cmos ic s, in The Proceedings of the International Symposium on Low Power Electronics and Design, April 1995, pp [3] E. Kusse and J. M. Rabaey, Low-energy embedded FPGA structure, in The Proceedings of the International Symposium on Low Power Electronics and Design, 1998, pp [4] L. Shang, A. S. Kaviani, and K. Bathala, Dynamic power consumption in Virtex-II FPGA family, in The Proceedings of the International Symposium on Field Programmable Gate Arrays, February 02. [5] S. J. Wilton, S.-S. Ang, and W. Luk, The impact of pipelining on energy per operation in field-programmable gate arrays, in The Proceedings of the International Conference on Field-Programmable Logic and Applications, August 04. [6] K. N. Lalgudi and M. C. Papaefthymiou, Fixed-phase retiming for low power design, in The Proceedings of the International Symposium on Low Power Electronics and Design, [7] H. Li and S. Katkoori, Power minimization algorithms for lut-based FPGA technology mapping, ACM Transaction on Design Automation of Electronic Systems, vol. 9, no. 1, pp , January 04. [8] B. Kumthekar, L. Benini, E. Macii, and F. Somenzi, Power optimization of FPGA-based designs without rewiring, IEEE Proceedings Computers and Digital Techniques, vol. 147, no. 3, pp , May 00. [9] Xilinx Inc., Virtex-II Platform FPGA Handbook, 00. [10] Altera Corporation, Using the clocklock and clockboost pll features, Altera Application Note, no. 1, November 03. [11] F. N. Najm, Transition density: a new measure of activity in digital circuits, IEEE Transaction on CAD of IC and Systems, vol. 12, no. 2, February [12] H. Mehta, M. Borah, R. M. Owens, and M. J. Irwin, Accurate estimation of combinational circuit activity, in The Proceedings of the 32nd IEEE/ACM Design Automation Conference, June 1995, pp [13] J. H.Anderson and F. N.Najm, Power estimation techniques for FP- GAs, IEEE Transaction on VLSI Systems, vol. 12, no. 10, pp , October 04. [14] L. Lovasz, Stable set and polynomials, Discrete Mathematics, vol. 124, pp , [] H. G. Lee, S. Nam, and N. Chang, Cycle-accurate energy measurement and high-level energy characterization of FPGAs, in The Proceedings of 4th International Symposium on Quality Electronic Design, March 03, pp

Retiming Sequential Circuits for Low Power

Retiming Sequential Circuits for Low Power Retiming Sequential Circuits for Low Power José Monteiro, Srinivas Devadas Department of EECS MIT, Cambridge, MA Abhijit Ghosh Mitsubishi Electric Research Laboratories Sunnyvale, CA Abstract Switching

More information

REDUCING DYNAMIC POWER BY PULSED LATCH AND MULTIPLE PULSE GENERATOR IN CLOCKTREE

REDUCING DYNAMIC POWER BY PULSED LATCH AND MULTIPLE PULSE GENERATOR IN CLOCKTREE Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 5, May 2014, pg.210

More information

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS)

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS) International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Journal of Emerging Technologies in Computational

More information

Peak Dynamic Power Estimation of FPGA-mapped Digital Designs

Peak Dynamic Power Estimation of FPGA-mapped Digital Designs Peak Dynamic Power Estimation of FPGA-mapped Digital Designs Abstract The Peak Dynamic Power Estimation (P DP E) problem involves finding input vector pairs that cause maximum power dissipation (maximum

More information

L11/12: Reconfigurable Logic Architectures

L11/12: Reconfigurable Logic Architectures L11/12: Reconfigurable Logic Architectures Acknowledgements: Materials in this lecture are courtesy of the following people and used with permission. - Randy H. Katz (University of California, Berkeley,

More information

An FPGA Implementation of Shift Register Using Pulsed Latches

An FPGA Implementation of Shift Register Using Pulsed Latches An FPGA Implementation of Shift Register Using Pulsed Latches Shiny Panimalar.S, T.Nisha Priscilla, Associate Professor, Department of ECE, MAMCET, Tiruchirappalli, India PG Scholar, Department of ECE,

More information

Field Programmable Gate Arrays (FPGAs)

Field Programmable Gate Arrays (FPGAs) Field Programmable Gate Arrays (FPGAs) Introduction Simulations and prototyping have been a very important part of the electronics industry since a very long time now. Before heading in for the actual

More information

data and is used in digital networks and storage devices. CRC s are easy to implement in binary

data and is used in digital networks and storage devices. CRC s are easy to implement in binary Introduction Cyclic redundancy check (CRC) is an error detecting code designed to detect changes in transmitted data and is used in digital networks and storage devices. CRC s are easy to implement in

More information

Figure.1 Clock signal II. SYSTEM ANALYSIS

Figure.1 Clock signal II. SYSTEM ANALYSIS International Journal of Advances in Engineering, 2015, 1(4), 518-522 ISSN: 2394-9260 (printed version); ISSN: 2394-9279 (online version); url:http://www.ijae.in RESEARCH ARTICLE Multi bit Flip-Flop Grouping

More information

Why FPGAs? FPGA Overview. Why FPGAs?

Why FPGAs? FPGA Overview. Why FPGAs? Transistor-level Logic Circuits Positive Level-sensitive EECS150 - Digital Design Lecture 3 - Field Programmable Gate Arrays (FPGAs) January 28, 2003 John Wawrzynek Transistor Level clk clk clk Positive

More information

L12: Reconfigurable Logic Architectures

L12: Reconfigurable Logic Architectures L12: Reconfigurable Logic Architectures Acknowledgements: Materials in this lecture are courtesy of the following sources and are used with permission. Frank Honore Prof. Randy Katz (Unified Microelectronics

More information

Low Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Low Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Low Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No. # 29 Minimizing Switched Capacitance-III. (Refer

More information

Leakage Current Reduction in Sequential Circuits by Modifying the Scan Chains

Leakage Current Reduction in Sequential Circuits by Modifying the Scan Chains eakage Current Reduction in Sequential s by Modifying the Scan Chains Afshin Abdollahi University of Southern California (3) 592-3886 afshin@usc.edu Farzan Fallah Fujitsu aboratories of America (48) 53-4544

More information

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath Objectives Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath In the previous chapters we have studied how to develop a specification from a given application, and

More information

Modifying the Scan Chains in Sequential Circuit to Reduce Leakage Current

Modifying the Scan Chains in Sequential Circuit to Reduce Leakage Current IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 3, Issue 1 (Sep. Oct. 2013), PP 01-09 e-issn: 2319 4200, p-issn No. : 2319 4197 Modifying the Scan Chains in Sequential Circuit to Reduce Leakage

More information

EL302 DIGITAL INTEGRATED CIRCUITS LAB #3 CMOS EDGE TRIGGERED D FLIP-FLOP. Due İLKER KALYONCU, 10043

EL302 DIGITAL INTEGRATED CIRCUITS LAB #3 CMOS EDGE TRIGGERED D FLIP-FLOP. Due İLKER KALYONCU, 10043 EL302 DIGITAL INTEGRATED CIRCUITS LAB #3 CMOS EDGE TRIGGERED D FLIP-FLOP Due 16.05. İLKER KALYONCU, 10043 1. INTRODUCTION: In this project we are going to design a CMOS positive edge triggered master-slave

More information

EN2911X: Reconfigurable Computing Topic 01: Programmable Logic. Prof. Sherief Reda School of Engineering, Brown University Fall 2014

EN2911X: Reconfigurable Computing Topic 01: Programmable Logic. Prof. Sherief Reda School of Engineering, Brown University Fall 2014 EN2911X: Reconfigurable Computing Topic 01: Programmable Logic Prof. Sherief Reda School of Engineering, Brown University Fall 2014 1 Contents 1. Architecture of modern FPGAs Programmable interconnect

More information

SYNCHRONOUS DERIVED CLOCK AND SYNTHESIS OF LOW POWER SEQUENTIAL CIRCUITS *

SYNCHRONOUS DERIVED CLOCK AND SYNTHESIS OF LOW POWER SEQUENTIAL CIRCUITS * SYNCHRONOUS DERIVED CLOCK AND SYNTHESIS OF LOW POWER SEUENTIAL CIRCUITS * Wu Xunwei (Department of Electronic Engineering Hangzhou University Hangzhou 328) ing Wu Massoud Pedram (Department of Electrical

More information

Prototyping an ASIC with FPGAs. By Rafey Mahmud, FAE at Synplicity.

Prototyping an ASIC with FPGAs. By Rafey Mahmud, FAE at Synplicity. Prototyping an ASIC with FPGAs By Rafey Mahmud, FAE at Synplicity. With increased capacity of FPGAs and readily available off-the-shelf prototyping boards sporting multiple FPGAs, it has become feasible

More information

Sequencing. Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Fall,

Sequencing. Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Fall, Sequencing ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Fall, 2013 ldvan@cs.nctu.edu.tw http://www.cs.nctu.edu.tw/~ldvan/ Outlines Introduction Sequencing

More information

Reconfigurable Architectures. Greg Stitt ECE Department University of Florida

Reconfigurable Architectures. Greg Stitt ECE Department University of Florida Reconfigurable Architectures Greg Stitt ECE Department University of Florida How can hardware be reconfigurable? Problem: Can t change fabricated chip ASICs are fixed Solution: Create components that can

More information

Bit Swapping LFSR and its Application to Fault Detection and Diagnosis Using FPGA

Bit Swapping LFSR and its Application to Fault Detection and Diagnosis Using FPGA Bit Swapping LFSR and its Application to Fault Detection and Diagnosis Using FPGA M.V.M.Lahari 1, M.Mani Kumari 2 1,2 Department of ECE, GVPCEOW,Visakhapatnam. Abstract The increasing growth of sub-micron

More information

Combining Dual-Supply, Dual-Threshold and Transistor Sizing for Power Reduction

Combining Dual-Supply, Dual-Threshold and Transistor Sizing for Power Reduction Combining Dual-Supply, Dual-Threshold and Transistor Sizing for Reduction Stephanie Augsburger 1, Borivoje Nikolić 2 1 Intel Corporation, Enterprise Processors Division, Santa Clara, CA, USA. 2 Department

More information

Report on 4-bit Counter design Report- 1, 2. Report on D- Flipflop. Course project for ECE533

Report on 4-bit Counter design Report- 1, 2. Report on D- Flipflop. Course project for ECE533 Report on 4-bit Counter design Report- 1, 2. Report on D- Flipflop Course project for ECE533 I. Objective: REPORT-I The objective of this project is to design a 4-bit counter and implement it into a chip

More information

FPGA Glitch Power Analysis and Reduction

FPGA Glitch Power Analysis and Reduction FPGA Glitch Power Analysis and Reduction Warren Shum and Jason H. Anderson Department of Electrical and Computer Engineering, University of Toronto Toronto, ON. Canada {shumwarr, janders}@eecg.toronto.edu

More information

Combinational vs Sequential

Combinational vs Sequential Combinational vs Sequential inputs X Combinational Circuits outputs Z A combinational circuit: At any time, outputs depends only on inputs Changing inputs changes outputs No regard for previous inputs

More information

WINTER 15 EXAMINATION Model Answer

WINTER 15 EXAMINATION Model Answer Important Instructions to examiners: 1) The answers should be examined by key words and not as word-to-word as given in the model answer scheme. 2) The model answer and the answer written by candidate

More information

Investigation of Look-Up Table Based FPGAs Using Various IDCT Architectures

Investigation of Look-Up Table Based FPGAs Using Various IDCT Architectures Investigation of Look-Up Table Based FPGAs Using Various IDCT Architectures Jörn Gause Abstract This paper presents an investigation of Look-Up Table (LUT) based Field Programmable Gate Arrays (FPGAs)

More information

Designing for High Speed-Performance in CPLDs and FPGAs

Designing for High Speed-Performance in CPLDs and FPGAs Designing for High Speed-Performance in CPLDs and FPGAs Zeljko Zilic, Guy Lemieux, Kelvin Loveless, Stephen Brown, and Zvonko Vranesic Department of Electrical and Computer Engineering University of Toronto,

More information

Power Optimization by Using Multi-Bit Flip-Flops

Power Optimization by Using Multi-Bit Flip-Flops Volume-4, Issue-5, October-2014, ISSN No.: 2250-0758 International Journal of Engineering and Management Research Page Number: 194-198 Power Optimization by Using Multi-Bit Flip-Flops D. Hazinayab 1, K.

More information

OF AN ADVANCED LUT METHODOLOGY BASED FIR FILTER DESIGN PROCESS

OF AN ADVANCED LUT METHODOLOGY BASED FIR FILTER DESIGN PROCESS IMPLEMENTATION OF AN ADVANCED LUT METHODOLOGY BASED FIR FILTER DESIGN PROCESS 1 G. Sowmya Bala 2 A. Rama Krishna 1 PG student, Dept. of ECM. K.L.University, Vaddeswaram, A.P, India, 2 Assistant Professor,

More information

HIGH PERFORMANCE AND LOW POWER ASYNCHRONOUS DATA SAMPLING WITH POWER GATED DOUBLE EDGE TRIGGERED FLIP-FLOP

HIGH PERFORMANCE AND LOW POWER ASYNCHRONOUS DATA SAMPLING WITH POWER GATED DOUBLE EDGE TRIGGERED FLIP-FLOP HIGH PERFORMANCE AND LOW POWER ASYNCHRONOUS DATA SAMPLING WITH POWER GATED DOUBLE EDGE TRIGGERED FLIP-FLOP 1 R.Ramya, 2 C.Hamsaveni 1,2 PG Scholar, Department of ECE, Hindusthan Institute Of Technology,

More information

LOW POWER AND HIGH PERFORMANCE SHIFT REGISTERS USING PULSED LATCH TECHNIQUE

LOW POWER AND HIGH PERFORMANCE SHIFT REGISTERS USING PULSED LATCH TECHNIQUE OI: 10.21917/ijme.2018.0088 LOW POWER AN HIGH PERFORMANCE SHIFT REGISTERS USING PULSE LATCH TECHNIUE Vandana Niranjan epartment of Electronics and Communication Engineering, Indira Gandhi elhi Technical

More information

An Efficient High Speed Wallace Tree Multiplier

An Efficient High Speed Wallace Tree Multiplier Chepuri satish,panem charan Arur,G.Kishore Kumar and G.Mamatha 38 An Efficient High Speed Wallace Tree Multiplier Chepuri satish, Panem charan Arur, G.Kishore Kumar and G.Mamatha Abstract: The Wallace

More information

VLSI Technology used in Auto-Scan Delay Testing Design For Bench Mark Circuits

VLSI Technology used in Auto-Scan Delay Testing Design For Bench Mark Circuits VLSI Technology used in Auto-Scan Delay Testing Design For Bench Mark Circuits N.Brindha, A.Kaleel Rahuman ABSTRACT: Auto scan, a design for testability (DFT) technique for synchronous sequential circuits.

More information

Latch-Based Performance Optimization for FPGAs. Xiao Teng

Latch-Based Performance Optimization for FPGAs. Xiao Teng Latch-Based Performance Optimization for FPGAs by Xiao Teng A thesis submitted in conformity with the requirements for the degree of Master of Applied Science Graduate Department of ECE University of Toronto

More information

Australian Journal of Basic and Applied Sciences. Design of SRAM using Multibit Flipflop with Clock Gating Technique

Australian Journal of Basic and Applied Sciences. Design of SRAM using Multibit Flipflop with Clock Gating Technique ISSN:1991-8178 Australian Journal of Basic and Applied Sciences Journal home page: www.ajbasweb.com Design of SRAM using Multibit Flipflop with Clock Gating Technique 1 Divya R. and 2 Hemalatha K.L. 1

More information

High Performance Dynamic Hybrid Flip-Flop For Pipeline Stages with Methodical Implanted Logic

High Performance Dynamic Hybrid Flip-Flop For Pipeline Stages with Methodical Implanted Logic High Performance Dynamic Hybrid Flip-Flop For Pipeline Stages with Methodical Implanted Logic K.Vajida Tabasum, K.Chandra Shekhar Abstract-In this paper we introduce a new high performance dynamic hybrid

More information

Interconnect Planning with Local Area Constrained Retiming

Interconnect Planning with Local Area Constrained Retiming Interconnect Planning with Local Area Constrained Retiming Ruibing Lu and Cheng-Kok Koh School of Electrical and Computer Engineering Purdue University,West Lafayette, IN, 47907, USA {lur, chengkok}@ecn.purdue.edu

More information

Random Access Scan. Veeraraghavan Ramamurthy Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL

Random Access Scan. Veeraraghavan Ramamurthy Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL Random Access Scan Veeraraghavan Ramamurthy Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL ramamve@auburn.edu Term Paper for ELEC 7250 (Spring 2005) Abstract: Random Access

More information

GlitchLess: An Active Glitch Minimization Technique for FPGAs

GlitchLess: An Active Glitch Minimization Technique for FPGAs GlitchLess: An Active Glitch Minimization Technique for FPGAs Julien Lamoureux, Guy G. Lemieux, Steven J.E. Wilton Department of Electrical and Computer Engineering University of British Columbia Vancouver,

More information

Reduction of Clock Power in Sequential Circuits Using Multi-Bit Flip-Flops

Reduction of Clock Power in Sequential Circuits Using Multi-Bit Flip-Flops Reduction of Clock Power in Sequential Circuits Using Multi-Bit Flip-Flops A.Abinaya *1 and V.Priya #2 * M.E VLSI Design, ECE Dept, M.Kumarasamy College of Engineering, Karur, Tamilnadu, India # M.E VLSI

More information

Design of Fault Coverage Test Pattern Generator Using LFSR

Design of Fault Coverage Test Pattern Generator Using LFSR Design of Fault Coverage Test Pattern Generator Using LFSR B.Saritha M.Tech Student, Department of ECE, Dhruva Institue of Engineering & Technology. Abstract: A new fault coverage test pattern generator

More information

International Journal of Scientific & Engineering Research, Volume 5, Issue 9, September ISSN

International Journal of Scientific & Engineering Research, Volume 5, Issue 9, September ISSN International Journal of Scientific & Engineering Research, Volume 5, Issue 9, September-2014 917 The Power Optimization of Linear Feedback Shift Register Using Fault Coverage Circuits K.YARRAYYA1, K CHITAMBARA

More information

ELEN Electronique numérique

ELEN Electronique numérique ELEN0040 - Electronique numérique Patricia ROUSSEAUX Année académique 2014-2015 CHAPITRE 5 Sequential circuits design - Timing issues ELEN0040 5-228 1 Sequential circuits design 1.1 General procedure 1.2

More information

Abstract 1. INTRODUCTION. Cheekati Sirisha, IJECS Volume 05 Issue 10 Oct., 2016 Page No Page 18532

Abstract 1. INTRODUCTION. Cheekati Sirisha, IJECS Volume 05 Issue 10 Oct., 2016 Page No Page 18532 www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 5 Issue 10 Oct. 2016, Page No. 18532-18540 Pulsed Latches Methodology to Attain Reduced Power and Area Based

More information

A Low Power Delay Buffer Using Gated Driver Tree

A Low Power Delay Buffer Using Gated Driver Tree IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) ISSN: 2319 4200, ISBN No. : 2319 4197 Volume 1, Issue 4 (Nov. - Dec. 2012), PP 26-30 A Low Power Delay Buffer Using Gated Driver Tree Kokkilagadda

More information

Clock Tree Power Optimization of Three Dimensional VLSI System with Network

Clock Tree Power Optimization of Three Dimensional VLSI System with Network Clock Tree Power Optimization of Three Dimensional VLSI System with Network M.Saranya 1, S.Mahalakshmi 2, P.Saranya Devi 3 PG Student, Dept. of ECE, Syed Ammal Engineering College, Ramanathapuram, Tamilnadu,

More information

11. Sequential Elements

11. Sequential Elements 11. Sequential Elements Jacob Abraham Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017 October 11, 2017 ECE Department, University of Texas at Austin

More information

Power Efficient Design of Sequential Circuits using OBSC and RTPG Integration

Power Efficient Design of Sequential Circuits using OBSC and RTPG Integration Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 2, Issue. 9, September 2013,

More information

Optimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015

Optimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015 Optimization of Multi-Channel BCH Error Decoding for Common Cases Russell Dill Master's Thesis Defense April 20, 2015 Bose-Chaudhuri-Hocquenghem (BCH) BCH is an Error Correcting Code (ECC) and is used

More information

ISSCC 2003 / SESSION 19 / PROCESSOR BUILDING BLOCKS / PAPER 19.5

ISSCC 2003 / SESSION 19 / PROCESSOR BUILDING BLOCKS / PAPER 19.5 ISSCC 2003 / SESSION 19 / PROCESSOR BUILDING BLOCKS / PAPER 19.5 19.5 A Clock Skew Absorbing Flip-Flop Nikola Nedovic 1,2, Vojin G. Oklobdzija 2, William W. Walker 1 1 Fujitsu Laboratories of America,

More information

Optimizing area of local routing network by reconfiguring look up tables (LUTs)

Optimizing area of local routing network by reconfiguring look up tables (LUTs) Vol.2, Issue.3, May-June 2012 pp-816-823 ISSN: 2249-6645 Optimizing area of local routing network by reconfiguring look up tables (LUTs) Sathyabhama.B 1 and S.Sudha 2 1 M.E-VLSI Design 2 Dept of ECE Easwari

More information

VLSI Design: 3) Explain the various MOSFET Capacitances & their significance. 4) Draw a CMOS Inverter. Explain its transfer characteristics

VLSI Design: 3) Explain the various MOSFET Capacitances & their significance. 4) Draw a CMOS Inverter. Explain its transfer characteristics 1) Explain why & how a MOSFET works VLSI Design: 2) Draw Vds-Ids curve for a MOSFET. Now, show how this curve changes (a) with increasing Vgs (b) with increasing transistor width (c) considering Channel

More information

LFSR Counter Implementation in CMOS VLSI

LFSR Counter Implementation in CMOS VLSI LFSR Counter Implementation in CMOS VLSI Doshi N. A., Dhobale S. B., and Kakade S. R. Abstract As chip manufacturing technology is suddenly on the threshold of major evaluation, which shrinks chip in size

More information

Solution to Digital Logic )What is the magnitude comparator? Design a logic circuit for 4 bit magnitude comparator and explain it,

Solution to Digital Logic )What is the magnitude comparator? Design a logic circuit for 4 bit magnitude comparator and explain it, Solution to Digital Logic -2067 Solution to digital logic 2067 1.)What is the magnitude comparator? Design a logic circuit for 4 bit magnitude comparator and explain it, A Magnitude comparator is a combinational

More information

An Application Specific Reconfigurable Architecture Diagnosis Fault in the LUT of Cluster Based FPGA

An Application Specific Reconfigurable Architecture Diagnosis Fault in the LUT of Cluster Based FPGA International Journal of Innovative Research in Electronics and Communications (IJIREC) Volume 2, Issue 5, July 2015, PP 1-7 ISSN 2349-4042 (Print) & ISSN 2349-4050 (Online) www.arcjournals.org An Application

More information

Design Project: Designing a Viterbi Decoder (PART I)

Design Project: Designing a Viterbi Decoder (PART I) Digital Integrated Circuits A Design Perspective 2/e Jan M. Rabaey, Anantha Chandrakasan, Borivoje Nikolić Chapters 6 and 11 Design Project: Designing a Viterbi Decoder (PART I) 1. Designing a Viterbi

More information

A Fast Constant Coefficient Multiplier for the XC6200

A Fast Constant Coefficient Multiplier for the XC6200 A Fast Constant Coefficient Multiplier for the XC6200 Tom Kean, Bernie New and Bob Slous Xilinx Inc. Abstract. We discuss the design of a high performance constant coefficient multiplier on the Xilinx

More information

DIFFERENTIAL CONDITIONAL CAPTURING FLIP-FLOP TECHNIQUE USED FOR LOW POWER CONSUMPTION IN CLOCKING SCHEME

DIFFERENTIAL CONDITIONAL CAPTURING FLIP-FLOP TECHNIQUE USED FOR LOW POWER CONSUMPTION IN CLOCKING SCHEME DIFFERENTIAL CONDITIONAL CAPTURING FLIP-FLOP TECHNIQUE USED FOR LOW POWER CONSUMPTION IN CLOCKING SCHEME Mr.N.Vetriselvan, Assistant Professor, Dhirajlal Gandhi College of Technology Mr.P.N.Palanisamy,

More information

Chapter 5: Synchronous Sequential Logic

Chapter 5: Synchronous Sequential Logic Chapter 5: Synchronous Sequential Logic NCNU_2016_DD_5_1 Digital systems may contain memory for storing information. Combinational circuits contains no memory elements the outputs depends only on the inputs

More information

Asynchronous IC Interconnect Network Design and Implementation Using a Standard ASIC Flow

Asynchronous IC Interconnect Network Design and Implementation Using a Standard ASIC Flow Asynchronous IC Interconnect Network Design and Implementation Using a Standard ASIC Flow Bradley R. Quinton*, Mark R. Greenstreet, Steven J.E. Wilton*, *Dept. of Electrical and Computer Engineering, Dept.

More information

Timing with Virtual Signal Synchronization for Circuit Performance and Netlist Security

Timing with Virtual Signal Synchronization for Circuit Performance and Netlist Security Timing with Virtual Signal Synchronization for Circuit Performance and Netlist Security Grace Li Zhang, Bing Li, Ulf Schlichtmann Chair of Electronic Design Automation Technical University of Munich (TUM)

More information

EEC 116 Fall 2011 Lab #5: Pipelined 32b Adder

EEC 116 Fall 2011 Lab #5: Pipelined 32b Adder EEC 116 Fall 2011 Lab #5: Pipelined 32b Adder Dept. of Electrical and Computer Engineering University of California, Davis Issued: November 2, 2011 Due: November 16, 2011, 4PM Reading: Rabaey Sections

More information

COPY RIGHT. To Secure Your Paper As Per UGC Guidelines We Are Providing A Electronic Bar Code

COPY RIGHT. To Secure Your Paper As Per UGC Guidelines We Are Providing A Electronic Bar Code COPY RIGHT 2018IJIEMR.Personal use of this material is permitted. Permission from IJIEMR must be obtained for all other uses, in any current or future media, including reprinting/republishing this material

More information

EECS150 - Digital Design Lecture 18 - Circuit Timing (2) In General...

EECS150 - Digital Design Lecture 18 - Circuit Timing (2) In General... EECS150 - Digital Design Lecture 18 - Circuit Timing (2) March 17, 2010 John Wawrzynek Spring 2010 EECS150 - Lec18-timing(2) Page 1 In General... For correct operation: T τ clk Q + τ CL + τ setup for all

More information

Dual-V DD and Input Reordering for Reduced Delay and Subthreshold Leakage in Pass Transistor Logic

Dual-V DD and Input Reordering for Reduced Delay and Subthreshold Leakage in Pass Transistor Logic Dual-V DD and Input Reordering for Reduced Delay and Subthreshold Leakage in Pass Transistor Logic Jeff Brantley and Sam Ridenour ECE 6332 Fall 21 University of Virginia @virginia.edu ABSTRACT

More information

VirtualSync: Timing Optimization by Synchronizing Logic Waves with Sequential and Combinational Components as Delay Units

VirtualSync: Timing Optimization by Synchronizing Logic Waves with Sequential and Combinational Components as Delay Units VirtualSync: Timing Optimization by Synchronizing Logic Waves with Sequential and Combinational Components as Delay Units Grace Li Zhang 1, Bing Li 1, Masanori Hashimoto 2 and Ulf Schlichtmann 1 1 Chair

More information

Sharif University of Technology. SoC: Introduction

Sharif University of Technology. SoC: Introduction SoC Design Lecture 1: Introduction Shaahin Hessabi Department of Computer Engineering System-on-Chip System: a set of related parts that act as a whole to achieve a given goal. A system is a set of interacting

More information

TKK S ASIC-PIIRIEN SUUNNITTELU

TKK S ASIC-PIIRIEN SUUNNITTELU Design TKK S-88.134 ASIC-PIIRIEN SUUNNITTELU Design Flow 3.2.2005 RTL Design 10.2.2005 Implementation 7.4.2005 Contents 1. Terminology 2. RTL to Parts flow 3. Logic synthesis 4. Static Timing Analysis

More information

Memory elements. Topics. Memory element terminology. Variations in memory elements. Clock terminology. Memory element parameters. clock.

Memory elements. Topics. Memory element terminology. Variations in memory elements. Clock terminology. Memory element parameters. clock. Topics! Memory elements.! Basics of sequential machines. Memory elements! Stores a value as controlled by clock.! May have load signal, etc.! In CMOS, memory is created by:! capacitance (dynamic);! feedback

More information

Efficient Architecture for Flexible Prescaler Using Multimodulo Prescaler

Efficient Architecture for Flexible Prescaler Using Multimodulo Prescaler Efficient Architecture for Flexible Using Multimodulo G SWETHA, S YUVARAJ Abstract This paper, An Efficient Architecture for Flexible Using Multimodulo is an architecture which is designed from the proposed

More information

Design of Memory Based Implementation Using LUT Multiplier

Design of Memory Based Implementation Using LUT Multiplier Design of Memory Based Implementation Using LUT Multiplier Charan Kumar.k 1, S. Vikrama Narasimha Reddy 2, Neelima Koppala 3 1,2 M.Tech(VLSI) Student, 3 Assistant Professor, ECE Department, Sree Vidyanikethan

More information

Static Timing Analysis for Nanometer Designs

Static Timing Analysis for Nanometer Designs J. Bhasker Rakesh Chadha Static Timing Analysis for Nanometer Designs A Practical Approach 4y Spri ringer Contents Preface xv CHAPTER 1: Introduction / 1.1 Nanometer Designs 1 1.2 What is Static Timing

More information

Scan. This is a sample of the first 15 pages of the Scan chapter.

Scan. This is a sample of the first 15 pages of the Scan chapter. Scan This is a sample of the first 15 pages of the Scan chapter. Note: The book is NOT Pinted in color. Objectives: This section provides: An overview of Scan An introduction to Test Sequences and Test

More information

ECEN689: Special Topics in High-Speed Links Circuits and Systems Spring 2011

ECEN689: Special Topics in High-Speed Links Circuits and Systems Spring 2011 ECEN689: Special Topics in High-Speed Links Circuits and Systems Spring 2011 Lecture 9: TX Multiplexer Circuits Sam Palermo Analog & Mixed-Signal Center Texas A&M University Announcements & Agenda Next

More information

PERFORMANCE ANALYSIS OF AN EFFICIENT PULSE-TRIGGERED FLIP FLOPS FOR ULTRA LOW POWER APPLICATIONS

PERFORMANCE ANALYSIS OF AN EFFICIENT PULSE-TRIGGERED FLIP FLOPS FOR ULTRA LOW POWER APPLICATIONS Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,

More information

Power Reduction Techniques for a Spread Spectrum Based Correlator

Power Reduction Techniques for a Spread Spectrum Based Correlator Power Reduction Techniques for a Spread Spectrum Based Correlator David Garrett (garrett@virginia.edu) and Mircea Stan (mircea@virginia.edu) Center for Semicustom Integrated Systems University of Virginia

More information

LFSRs as Functional Blocks in Wireless Applications Author: Stephen Lim and Andy Miller

LFSRs as Functional Blocks in Wireless Applications Author: Stephen Lim and Andy Miller XAPP22 (v.) January, 2 R Application Note: Virtex Series, Virtex-II Series and Spartan-II family LFSRs as Functional Blocks in Wireless Applications Author: Stephen Lim and Andy Miller Summary Linear Feedback

More information

High Performance Carry Chains for FPGAs

High Performance Carry Chains for FPGAs High Performance Carry Chains for FPGAs Matthew M. Hosler Department of Electrical and Computer Engineering Northwestern University Abstract Carry chains are an important consideration for most computations,

More information

Adding Analog and Mixed Signal Concerns to a Digital VLSI Course

Adding Analog and Mixed Signal Concerns to a Digital VLSI Course Session Number 1532 Adding Analog and Mixed Signal Concerns to a Digital VLSI Course John A. Nestor and David A. Rich Department of Electrical and Computer Engineering Lafayette College Abstract This paper

More information

Reconfigurable FPGA Implementation of FIR Filter using Modified DA Method

Reconfigurable FPGA Implementation of FIR Filter using Modified DA Method Reconfigurable FPGA Implementation of FIR Filter using Modified DA Method M. Backia Lakshmi 1, D. Sellathambi 2 1 PG Student, Department of Electronics and Communication Engineering, Parisutham Institute

More information

K.T. Tim Cheng 07_dft, v Testability

K.T. Tim Cheng 07_dft, v Testability K.T. Tim Cheng 07_dft, v1.0 1 Testability Is concept that deals with costs associated with testing. Increase testability of a circuit Some test cost is being reduced Test application time Test generation

More information

Low-Power and Area-Efficient Shift Register Using Pulsed Latches

Low-Power and Area-Efficient Shift Register Using Pulsed Latches Low-Power and Area-Efficient Shift Register Using Pulsed Latches G.Sunitha M.Tech, TKR CET. P.Venkatlavanya, M.Tech Associate Professor, TKR CET. Abstract: This paper proposes a low-power and area-efficient

More information

VLSI System Testing. BIST Motivation

VLSI System Testing. BIST Motivation ECE 538 VLSI System Testing Krish Chakrabarty Built-In Self-Test (BIST): ECE 538 Krish Chakrabarty BIST Motivation Useful for field test and diagnosis (less expensive than a local automatic test equipment)

More information

Gated Driver Tree Based Power Optimized Multi-Bit Flip-Flops

Gated Driver Tree Based Power Optimized Multi-Bit Flip-Flops International Journal of Emerging Engineering Research and Technology Volume 2, Issue 4, July 2014, PP 250-254 ISSN 2349-4395 (Print) & ISSN 2349-4409 (Online) Gated Driver Tree Based Power Optimized Multi-Bit

More information

EEC 118 Lecture #9: Sequential Logic. Rajeevan Amirtharajah University of California, Davis Jeff Parkhurst Intel Corporation

EEC 118 Lecture #9: Sequential Logic. Rajeevan Amirtharajah University of California, Davis Jeff Parkhurst Intel Corporation EEC 118 Lecture #9: Sequential Logic Rajeevan Amirtharajah University of California, Davis Jeff Parkhurst Intel Corporation Outline Review: Static CMOS Logic Finish Static CMOS transient analysis Sequential

More information

Timing Error Detection: An Adaptive Scheme To Combat Variability EE241 Final Report Nathan Narevsky and Richard Ott {nnarevsky,

Timing Error Detection: An Adaptive Scheme To Combat Variability EE241 Final Report Nathan Narevsky and Richard Ott {nnarevsky, Timing Error Detection: An Adaptive Scheme To Combat Variability EE241 Final Report Nathan Narevsky and Richard Ott {nnarevsky, tomott}@berkeley.edu Abstract With the reduction of feature sizes, more sources

More information

An Application Specific Reconfigurable Architecture Diagnosis Fault in the LUT of Cluster Based FPGA

An Application Specific Reconfigurable Architecture Diagnosis Fault in the LUT of Cluster Based FPGA An Application Specific Reconfigurable Architecture Diagnosis Fault in the LUT of Cluster Based FPGA Abstract: The increased circuit complexity of field programmable gate array (FPGA) poses a major challenge

More information

140 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 12, NO. 2, FEBRUARY 2004

140 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 12, NO. 2, FEBRUARY 2004 140 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 12, NO. 2, FEBRUARY 2004 Leakage Current Reduction in CMOS VLSI Circuits by Input Vector Control Afshin Abdollahi, Farzan Fallah,

More information

Hardware Implementation of Viterbi Decoder for Wireless Applications

Hardware Implementation of Viterbi Decoder for Wireless Applications Hardware Implementation of Viterbi Decoder for Wireless Applications Bhupendra Singh 1, Sanjeev Agarwal 2 and Tarun Varma 3 Deptt. of Electronics and Communication Engineering, 1 Amity School of Engineering

More information

LOW POWER DOUBLE EDGE PULSE TRIGGERED FLIP FLOP DESIGN

LOW POWER DOUBLE EDGE PULSE TRIGGERED FLIP FLOP DESIGN INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 LOW POWER DOUBLE EDGE PULSE TRIGGERED FLIP FLOP DESIGN G.Swetha 1, T.Krishna Murthy 2 1 Student, SVEC (Autonomous),

More information

Design of a Low Power Four-Bit Binary Counter Using Enhancement Type Mosfet

Design of a Low Power Four-Bit Binary Counter Using Enhancement Type Mosfet Design of a Low Power Four-Bit Binary Counter Using Enhancement Type Mosfet Praween Sinha Department of Electronics & Communication Engineering Maharaja Agrasen Institute Of Technology, Rohini sector -22,

More information

EITF35: Introduction to Structured VLSI Design

EITF35: Introduction to Structured VLSI Design EITF35: Introduction to Structured VLSI Design Part 4.2.1: Learn More Liang Liu liang.liu@eit.lth.se 1 Outline Crossing clock domain Reset, synchronous or asynchronous? 2 Why two DFFs? 3 Crossing clock

More information

Implementation of Dynamic RAMs with clock gating circuits using Verilog HDL

Implementation of Dynamic RAMs with clock gating circuits using Verilog HDL Implementation of Dynamic RAMs with clock gating circuits using Verilog HDL B.Sanjay 1 SK.M.Javid 2 K.V.VenkateswaraRao 3 Asst.Professor B.E Student B.E Student SRKR Engg. College SRKR Engg. College SRKR

More information

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) Proceedings of the 2 nd International Conference on Current Trends in Engineering and Management ICCTEM -2014 ISSN

More information

2.6 Reset Design Strategy

2.6 Reset Design Strategy 2.6 Reset esign Strategy Many design issues must be considered before choosing a reset strategy for an ASIC design, such as whether to use synchronous or asynchronous resets, will every flipflop receive

More information

Testability: Lecture 23 Design for Testability (DFT) Slide 1 of 43

Testability: Lecture 23 Design for Testability (DFT) Slide 1 of 43 Testability: Lecture 23 Design for Testability (DFT) Shaahin hi Hessabi Department of Computer Engineering Sharif University of Technology Adapted, with modifications, from lecture notes prepared p by

More information

AN EFFICIENT LOW POWER DESIGN FOR ASYNCHRONOUS DATA SAMPLING IN DOUBLE EDGE TRIGGERED FLIP-FLOPS

AN EFFICIENT LOW POWER DESIGN FOR ASYNCHRONOUS DATA SAMPLING IN DOUBLE EDGE TRIGGERED FLIP-FLOPS AN EFFICIENT LOW POWER DESIGN FOR ASYNCHRONOUS DATA SAMPLING IN DOUBLE EDGE TRIGGERED FLIP-FLOPS NINU ABRAHAM 1, VINOJ P.G 2 1 P.G Student [VLSI & ES], SCMS School of Engineering & Technology, Cochin,

More information

A NOVEL DESIGN OF COUNTER USING TSPC D FLIP-FLOP FOR HIGH PERFORMANCE AND LOW POWER VLSI DESIGN APPLICATIONS USING 45NM CMOS TECHNOLOGY

A NOVEL DESIGN OF COUNTER USING TSPC D FLIP-FLOP FOR HIGH PERFORMANCE AND LOW POWER VLSI DESIGN APPLICATIONS USING 45NM CMOS TECHNOLOGY A NOVEL DESIGN OF COUNTER USING TSPC D FLIP-FLOP FOR HIGH PERFORMANCE AND LOW POWER VLSI DESIGN APPLICATIONS USING 45NM CMOS TECHNOLOGY Ms. Chaitali V. Matey 1, Ms. Shraddha K. Mendhe 2, Mr. Sandip A.

More information