Bus-Switch Coding, for Dynamic Power Management in off-chip communication channels.

Size: px
Start display at page:

Download "Bus-Switch Coding, for Dynamic Power Management in off-chip communication channels."

Transcription

1 in off-chip communication channels. Mauro Olivieri *, Francesco Pappalardo ** and Giuseppe Visalli **. * Department of Electronic Engineering, University of La Sapienza Rome, Italy ** Advanced System Technology, ST Microelectronics, Catania Italy Olivieri@dei.uniroma1.it, Francesco.Pappalardo@st.com, Giuseppe-ast.Visalli@st.com ABSTRACT The dynamic power management (DPM) represents an important challenge for extending the attery lifetime in a portale system. The power management, ased on static and off-line approaches, does not consider the asic property of a modern attery, which recovers a fraction of its charge during the idle time. The DPM approach profiles a complex system in different power figures depending on a reduced set of macro-states. The DPM prolem gives a sequence of macro-states which increases the attery lifetime. The dynamic power management is also required in complex systems where the power dissipation in communication channels represent a dominant factor. The modern communication arrangements operate at rate of some G Bit/sec, which implicated high transition activities, responsile of the dynamic power consumption. Moreover, the signal level involved in the output pads has a quadratic contriution in the dynamic power. The prolem of low-power us encoding has een extensively tackled in the past. The asic approach minimizes the transition density, directly related to the lines switching activities, responsile of the load/unload of the parasitic capacities. The current literature on low-power us encoding provides solutions, which do not guarantee a good activity saving increasing the us lines; this issue represents a huge limitation in the modern communication channels, which require high transmission andwidth. The paper introduced a novel low-power us encoding approach, ased on tentatively encoding, clustering and re-ordering the lines of a wide system data us used in multi-processor scenario. The Bus-Switch mechanism, as novel us encoding approach, drastically reduces the transition activity, preserving the required andwidth for high data-rate communications. Since the optimal us switch encoder complexity grows significantly decreasing the level of clustering, a su-optimal approach requires power management policy in order to effectively control the attery life. The paperwork presents an overview of the us-switch mechanism, including the required architecture for encoding/decoding the input lines. The RTL-model has een translated in a modern technology lirary at 90nm low-leakage using the Synopsys Tool for placement and CTS

2 (Physical Compiler v sp1 with Minimum Physical Constraint). We investigated y means of Synopsys Tools the integrity with respect to the ANSI C ehavioral model (VCS and custom PLI), the efficiency of coding (Power Compiler) and critical paths analysis (Primetime). Moreover, the paper addressed the asic guidelines for a software-controlled dynamic power management policy for communication channels, in order to increase the attery life. We presented a run-time power management system, which controls the configuration of the su-optimal us switch encoder, depending on the attery model that considered oth the discharge and recovery effects. The simulations indicated an increasing up to 10% in attery life, operating at 500 K-Bit/sec using a Pentium IV platform. The suoptimal us switch encoder, controlled y an efficient dynamic power management system, represents a good design - performance trade off. 2

3 Tale of Contents 1.0 Introduction The Bus Switch Mechanism The BS VLSI design implementation The power management policy for communication channels Simulation Results BS efficiency measured y means of Synopsys tools Conclusion and Discussions 12 Tale of Figures Figure 1 the roadmap for us-speed in the Intel s family processors.. 15 Figure 2 the Bus Switch Activity performance (Lines=32 Cluster depth=4) Figure 3 the BS activity savings, varying the us input lines, compared to the known algorithms.15 Figure 4 implementation of the swap operation (M=4). The asic swap ox 16 Figure 5 implementation of the swap operation (M=4). The complete swap unit. 16 Figure 6 twin swap unit for the implementation of coding functions II and III 16 Figure 7 architecture of the single unit inside the encoder..16 Figure 8 the L-way BS encoder..16 Figure 9 the BS decoder (Type I)...16 Figure 10 the BS decoder (Type II) 17 Figure 11 the BS decoder (Type III)..17 Figure 12 a multi-processor architecture, with BS-ased communication channels 17 Figure 13 the run-time software-controlled power management system.. 17 Figure 14 the minimum us capacitance for convenient BS at 90 nm 17 Figure 15 the 2-pipe stages BS encoder (Type III): 32lines,4 ways and 16 optimized patterns 18 Figure 16 BS critical path at 90nm and 10ns clock cycle, using Primetime.18 Tale1 examples of pattern and inverse pattern 13 Tale 2 different BS types, varying the coding-decoding functions.13 Tale 3 switching activity reduction of BS, referring to the three coding proposed functions.13 Tale 4 switching activity reduction of BS, assuming a reduced reordering pattern set (16 over 24 possile patterns). Bus width=32, M=4 14 Tale 5 the increased attery lifetime, using a run-time DPM ased on BS approach...14 Tale 6 timing report for 2-pipe BS at 90nm (fast clock 2.5 ns)

4 1.0 Introduction Nowadays, the challenge for reducing power dissipation in VLSI systems has the primary goal to decrease energy, maintaining acceptale some other performance constraints. The power-aware design methodology could e effectively applied to the interconnection media responsile of the major dynamic power consumption in a multi-processor system. The roadmap for the interconnection channels speed had a quadratic law (figure 1) for the Intel processor family in the last decade. In particular, the information rate is very close to the operative frequency, requiring high-speed communication channels. Most of communication channels employee parallel uses for transmitting information at high data rate. A single us line with capacity C and working at frequency 1/T dissipates a dynamic power: 1 P = 1! C! V 2 DD!"! (1.1) 2 T The dynamic power depends also from power supply Vdd and transition activity, which represents the 0->1 and 1->0 logic transitions. The technological approach for low-power uses decreases the signal level (Vdd) or us frequency (1/T). The signal level cannot always e decreased for signal integrity purposes in off-chip channels. Additionally, a reduced us frequency introduces a strong ottleneck with high-speed core at elevate information demand. The Bus Switch (BS) [8] [12] mechanism represents a possile answer for a very low-power us encoding, preserving the required andwidth for high-speed transmission. It is ased on tentatively encoding, clustering and reordering the input lines according to a reordering scheme. The complexity of us-switch encoder implies the use in off-chip communication system scenario, where the power savings involved y parasitic effects overcome the required energy for coding. The hardware required for us switch encoding could e effectively reduced operating in a su-optimal mode; the encoder performs a su-set of trials operating with an optimal su-set of reordering schemes. The process of scheme selection comes from either an on-line or off-line analysis. The paperwork introduced shortly the us switch mechanism, including the VLSI circuit implementation for a set of encoding approaches. We included the performance in terms of activity savings simulating, y ANSI C ehavioral encoder/decode models, a typical traffic in a modern System on a chip: inaries and multimedia files. Moreover, a complete synthesis flow has een pushed using the Synopsys Tool Physical Compiler and a technology lirary at 90nm from ST Microelectronics. The performance power trade off could e improved y a run-time dynamic power management policy. This work illustrates how the configuration of a suoptimal BS encoder could e changed on the fly permitting the employment of light BS implementation at the same activity saving performance. The proposed dynamic power management policy enales power states which dynamically increase the attery lifetime in a portale device. The effectiveness of the proposed approach has een demonstrated y discrete-time simulations [1] in ANSI C of the dynamic power manager and an accurate attery model, which consider oth the discharge and recovery effect [9]. A complete BS encoder operates at 50 MHz with 32-lines, which implies an information demand rate of 1.6Git/sec. This circuit saves the 30% in toggle activity operating in us capacitance etween 2pF and 4pF (typical values for off-chip lines in the target technology). The Run Time DPM manager, which aim is the increasing of attery lifetime, permits a transmission rate of 500Kit/sec, operating in a Pentium IV ased platform. The attery lifetime is increased up to a 10%. The employed su-optimal encoder increases the range of allowed us capacitance for a convenient use of the BS approach. The su-optimal us switch encoder, 4

5 controlled y an efficient dynamic power management system, represents a good design - performance trade off. The paper is organized as follows. Section 2 introduces the BS mechanism. The BS VLSI circuit implementation is explained in section 3. Section 4 illustrates the DPM policy for communication channels which use the BS encoder/decoder. Simulation results indicate the activity savings for BS mechanism oth for complete and reduced architecture in section 5. Moreover, this section presented the gain in attery lifetime using the proposed approach for a run time DPM. Section 6 investigated y means of Synopsys tools possile enhancement in VLSI design, performing power estimation and static timing analysis (STA). Lastly, section 7 is reserved for our conclusion and point of discussions. 2.0 The Bus-Switch Mechanism In principle, the BS technique can e logically expressed as a four-step process: 1. A large us is divided into several identical clusters of M (cluster depth) lines each. 2. Each M-line us is coded y reordering the input lines using a particular reordering pattern 3. A tentative data encoding is otained y applying to the swapped M lines a fixed coding function. 4. The process is repeated M! times from step 2 until the optimal reordering pattern is found, that minimizes the output switching activity in the encoded data of the whole us. In the following a formal definition of the process is given. Let (t) e input data word to the us encoder and B opt (t) the encoded data word on the us, at clock cycle t. The single its of any M-it data word x(t) will e indicated as x(t)(0), x(t)(1),, x(t)(m-1) Definition 1. A reordering pattern p(t) is an ordered set of M indices i 0 i M-1, associated with clock cycle t. Given a data word x(t), the swap operator S w with reordering pattern p is a cominational logic function producing a swapped data word y(t) = S w ( x(t), p(t) ), such that : y(t)(0) = x(t)( i 0 ), y(t)(1) = x(t)( i 1 ),..,.. y(t)(m-1) = x(t)( i M-1 ). As an example, if p(t) = "1,2,3,0" and x(t) = "0100", then S w ( x(t), p(t) ) = "1000". Note that each reordering pattern p(t) has a unique inverse p -1 (t), such that x(t) = Sw [( Sw ( x(t),p(t) ), p -1 (t)]. Examples of reordering patterns and their inverse are shown in Tale 1. Definition 2. A coding function is a cominational logic function producing a data word B(t), applying swapping to (t) and employing any other words resulting from input or output oservation. 5

6 Definition 3. The optimal reordering pattern p opt (t) is the reordering pattern that minimizes the switching H B opt t "1! B t, where H is the Hamming distance from previous transmission [ ] activity ( ) ( ) B opt (t - 1) and the coding function result, varying the reordering pattern. Figure 2 illustrates the activity savings, stimulating the us with inary and multimedia enchmarks and employing the following coding and decoding function: B ( ) ( t) = S ( ( t), p( t )" S B( t ), p ( t) W ( t) = S S B( t ), p ( t) w w [( ( )" B( t ), p ( t) ] w The us switch mechanism implicates several design performance trade-off: The pattern has to e transmitted onto extra lines for the exact decoding. Since the pattern employees few additional lines, a known us encoding could e used (Bus Invert, Adaptive Bus Invert, etc.) for further activity savings. The us switch encoder (BSE) grows in complexity increasing the cluster depth M. In particular, the optimal BS allows M! different patterns. This issue suggests the employment of a reduced and optimized su-set of reordering patterns, decreasing the hardware complexity. Figure 2 shows how a reduced and optimized pattern set (M=4 and 16 patterns BS4X (16)) does not affect relevant performance degradation with respect to the optimal scheme (OPT-BS4X (24)). The process of patterns selection could e on-line or off-line (Figure 2) driven y the activity savings. The activity savings do not significantly change, varying the us input lines (Figure 3). The off-chip uses represent the ideal field of application for the BS mechanism. The physical implementation, in 130nm low-leakage technology, suggests a convenient use of BS systems in uses with loads from 2 to 4pF, typical values for off-chip uses [8]. There are several degrees of freedom for choosing the coding function. As example we illustrated the changes in the micro-architecture using the coding functions illustrated in Tale The BS VLSI design implementation. This section illustrates the Bus Switch VLSI circuit implementation, starting from the ottom level architecture to the top. 3.1 Encoder The reordering patterns can e sequentially generated y a finite state machine (FSM) very similar to a inary counter. The direct inary representation of a reordering pattern is a vector of M inary numers each ranging from 0 to M-1, therefore requiring M log2 M its. The swap operation is performed y a set of multiplexers as in Fig. 4 and 5. Referring to M = 4, the 8-it pattern is partitioned into four 2-it numers, namely A, B, C, D in the Figure. 4. 6

7 Coding function I is directly the swapped word. Coding functions II and III are implemented y a twin swap unit, illustrated in Figure. 6; the conversion from a reordering pattern to its inverse is directly implemented y a dedicated two-level cominational logic unit PConv. A fully sequential implementation of a BS decoder would require the unit to perform M! sequential attempts efore selecting the est pattern and corresponding encoded word. This would imply an operating clock frequency at least M! times faster than the us operating frequency. More conveniently, a partially or fully parallel implementation can e pursued, employing L units, each performing M! / L attempts. In the following we will refer to such solution as an L-way parallel architecture. The corresponding architecture for the single unit is shown in Figure. 7. PatGen is the FSM that generates the set of allowed pattern to e tried; H produces the Hamming distance etween two words y performing a population count after XOR-ing. The Cmp unit compares the actual Hamming distance measured with the temporary minimum. When all the patterns have een tried and the minimum distance found, the threshold unit stores the pattern, the encoded word and the distance value on output registers. Figure 8 shows the top view of the encoder architecture. A special attention is deserved y the pattern transmission over the us. Though a direct representation requires M log2 M its, the actual numer of valid patterns is at most M! in a full pattern set, and even less in a reduced pattern set. Therefore, y introducing a dedicated cominational compressor transforming the direct representation into a symolic inary representation, the extra-lines to transmit the patterns must e at most log2 M!. In addition, in order to minimize the switching activity in those extra lines, a conventional Bus Invert (BI) coding is used on them. 3.2 Decoder When using coding function I, the decoder architecture is directly the one depicted in the top part of Figure 9. The PConv cominational lock performs the pattern conversion to otain the inverse pattern. In addition, a BI decoding unit and a cominational de-compressor elaorate the extra-lines dedicated to pattern transmission, to reconstruct the direct representation of the transmitted pattern. Figure 10 and 11 show the architecture of the decoder for coding functions II and III, respectively. 4.0 The power management policy for communication channels. We explained in two different and independent ways, how the BS-ased communication channels increased the attery lifetime in a multi-processor system. The theoretical approach, formalizes the prolem accordantly with the current scientific literature on DPM. The practical point of view helps the system engineers for an easy implementation. The suoptimal BS approach considers a reduced set of trials, which implicates a circuit with a decreased area and energy dissipation. This set of trials represents the BSE current configuration. The proposed software controlled DPM policy simulates the BS-encoder with every possile configuration, estimating the attery lifetime. This issue represents the common point for the following susections. The RT-DPM policy considers a set of possile configurations for encoding the input stream, searching for the est, which minimize the attery s discharge rate parameter, directly dependent from the us transition activity. In this section we considered an L-line us, divided into identical clusters of M-lines each. 7

8 4.1 Practical Realization of the proposed DPM policy. The proposed DPM policy is essentially ased on run time software controller strategy. The comined hardware software co-design permits an easy implementation in multi-processor systems ased on us switch approach. These are the essential requirements: 1. The RTL-Model of a su-optimal Bus Switch Encoder / Decoder. Experimental results indicate for cluster depth M=4 a minimum pattern set of R=8 different reordering patterns. 2. The RTL-Model of a BS-Driver, which transmits the current configuration at every BS encoder in the system. 3. An original su set of reordering patterns coming from an off-line analysis (I). This analysis selects the most used patterns, which represent the initial set. 4. Software BS-model used for simulating the attery lifetime. This ANSI C code needs to e the fastest implementation. Some compiler optimizations for timing can e used. 5. A software model of the considered attery. This model has to take account of the discharge rate and recovery effects. The discharge rate will depends on the total switching activity, measured y the BS-simulator. This parameter will e linked with the dynamic power only. The static current does not depend of the BS configuration, so it will e neglected. The recovery effect depends on the us idle time. The estimated traffic rate in each considered us is used for calculating the recovery effect parameter. The current literature indicates this parameter strongly dependent from the actual attery charge (attery state) 5. A real time clock for measurements. Alternatively, a cycle accurate ISS simulator can e used. The DPM policy admits a numer of power states (that is configurations) identical to the possile cominations of I patterns taken R y R. S ( I % & # = ' R$ I! R!( " I! R)! = (4.1.1) The DPM manager makes S trials, measuring the total switching activity. This activity permits the discharge rate parameter calculation. In particular, each channel has a percentage of activity as the actual activity / maximum activity ratio (see 4.2.1). The discharge rate parameter is the percentage of switching activity with respect the total activity in whole uses (see 4.2.2). The discrete-time simulation permits the attery simulation in order to calculate the total lifetime. The attery state represents the internal charge. The ANSI C code elow shows the attery simulation, which depends on the calculated discharge rate and recovery effects. The attery is unloaded when the internal state is zero. After S trials, the DPM manager identified the est configuration updating, y BS-drivers, the BS encoders in the system. 8

9 void sim_step(attery *p) { doule num; } num = uniform(); // Generate a random real positive numer less than 1.0 if ( num < p->discharge_rate) p->current_state = p->current_state - 1; // The attery lost charge units else if ( num < *(p->recovery+p->current_state)) // The recovery effect depends on actual state p->current_state = p->current_state + 1; // The attery recovers charge 4.2 Formal Approach to the proposed DPM policy. The software controlled DPM approach enales novel con-figurations of the emedded system, in order to increase the attery lifetime. This susection introduces the methodology for controlling, via software, the power dissipation in multiprocessor systems, which use the BS mechanism for low energy interconnections (Figure 12). The BS encoder performs trials, growing in numers with factorial law, depending on the input lines in the cluster. A reduced and optimized set of reordering patterns does not significantly affect activity savings, in oth on-line and off-line configuration modes. Let us consider an L-line us, divided into identical cluster of M-lines each. The initial pattern set implicates M! different permutations. We operate with an initial decreased set of I reordering patterns ( I! M! ) starting from an off-line analysis. Moreover, in order to reduce the hardware, the effective BSE uses R different patterns ( R! I ).The proposed DPM policy considers a much reduced set of BSconfigurations, depending on the possile cominations of su-set input patterns (see 4.1.1). The proposed DPM policy admits S different power states, for each channel, which implicates different us transition density varying the input traffic. Let us assume an initial S power state for channel : 0 The power manager (Figure 13) analyzes N different S KN transmissions; represents the power state at step k. This system performs S different trials, employing a software BS model, measuring the transition density: $ ( S ) us _ activity = = max_ us _ activity L # ~ i= 1 n x i ~ ( N " T, S ) N " L (4.2.1) The BS system, which employees the current configuration S ~ encodes the input traffic at xi rate 1/T, introducing a numer of transitions n in the generic i-th us line. The attery s discharge rate parameter used in [9] is strongly dependent from the total transition density. If the system admits B different channels: 9

10 disch arg erate = B " = 1 # ~ ( S KN) B (4.2.2) The charge recovery state depends on us idle time, during the time interval [0, NT].Since the recovery effect does not depend on the us transition activity, the power state at step k+1 minimizes the discharge rate only: B B B [ ]! ( S )" [ S k + N S k + M ] S ~ min ~ 1 1 ~ # ( 1), ( 1), L, S L, = 1 (4.2.3) 5.0 Simulation Results. 5.1 The complete BS system. The RTL architectural description of complete BS encoder has een translated in a 90nm technology lirary at 50 MHz. This frequency represents the minimum latency for a 4-Way encoder in a 32-it lines and cluster depth 4 (BS-Type III). In these operative conditions the proposed encoder performs an activity savings more than 20% with respect the un-coded transmission (Tale 3). We considered the transmission of inaries and multimedia data at clock rate; these enchmarks derive from a Linux distriution. The use of a reduced pattern set, from an off-line analysis, does not significantly changes the performance as illustrated in Tale 4 and Figure 2. The 50 MHz clock permits an operative transmission at 1.6Git/sec. 5.2 The DPM policy, which employees a su-optimal BS system. The proposed DPM policy changes on the fly the encoder current configuration, in order to increase the attery lifetime, in a multi-processor system. We derived an analytical model of the attery ehavior, which considered oth the Recovery and Rate Capacity effects [9]. These effects depend on the us idle time and the transition density related to the current configuration. The simulations consider the transmission of inary and multimedia streams with an idle time uniformly distriuted in [0, NT/10]. In particular, a 32-lines system us has een encoded with the BS mechanism, which employed the coding function (III), cluster depth 4 and R=8 different patterns in an original set of I=11 (BS4X (11), S=165). The BS current configuration is updated every N=20 transmissions (slots). The analyzed-slots / updated-slots ratio has een fixed to 20:20. The attery model considers 3000 charge units, with a time constant of 97.5msec. The RT-DPM achieves an increased attery lifetime up to 10% with respect a BS-ased interconnect system without power management policies (Tale 5). These enchmarks have een simulated in ST200 emedded core platform at 400 MHz from ST Microelectronics. The time for profiling (~60msec) limited the communication channels andwidth up to 10 Kiloit/sec. A more attractive communication andwidth could e reached operating with a different analyzed-slots / updated-slots ratio. For example, operating with a ratio of 20:500 (updates the state every 500 slots) the Pentium IV core processor executes the run time power manager in less than 1 micro second, permitting an operating frequency of 500 Kiloit/s. 10

11 6.0 BS Efficiency measured y means of Synopsys Tools. The efficiency of the proposed us encoding scheme for low-power has een possile, exploring the RTL alignment with ANSI C ehavioral model y Synopsys VCS and custom PLIs. Additionally, the convenience with respect to the known us encoding approaches has een validated measuring the BS encoder power dissipation y Synopsys Power Compiler. In particular, we calculated the gain in energy compared to the un-coded transmission in a typical off-chip communication scenario: the PCI local us. Finally, the real implementation cannot neglect the impact of critical timing arcs, which represents the actual frequency cutoff for a convenient use of BS approach. Primetime tool help us to explore timing critical paths in order to extend the BS field-of-application. 6.1 VCS validated the ehavioral ANSI C BS-model. The activity savings illustrated in Tale 3 and 4 derives from a ANSI C us switch model, which analyzes enchmarks used in Linux OS: LaTeX distriution, Berkeley Spice, Gcc compiler and samples of Jpeg, MP3 and AVI files. The RTL validation compared the produced waveform with respect to the C model. In particular we compared the sequence of reordering pattern and the output us versus time. The circuit model (Verilog) needs particular interfaces for stimulating the us accesses during enchmarks (PLIs). VCS validated with success our RTL model for us switch encoding. 6.2 Power Compiler demonstrated the efficiency of coding The efficiency of coding could e demonstrated, calculating the energy savings per cycles with respect an un-coded transmission. The proposed enchmarks have een used also for gate level simulation, in order to estimate the encoder power consumption. The total alance of average energy saving per us cycle is therefore: E saved = 0.5 switching_reduction C us V dd 2 energy_overhead Where energy overhead represents the BS encoder energy dissipation; the total energy saving percentage is expressed y the ratio E % = (0.5 C us V dd 2 E saved ) / (0.5 C us V dd 2 ) 100% A value of E % lower than 100% means that the BS is effective in reducing the total energy consumed per us cycle, while E % greater than 100% means that the us capacitance is so small that the energy overhead of the encoder dominates and the BS technique is inappropriate. Referring to the 90nm implementations of BS-III (32-it us, 4-it cluster size, 16 patterns) we can show the dependency of the E % from the us line capacitance, and compare it with typical applications. Fig. 14 shows the results for the 4-way implementation. We considered the typical PCI us electrical specifications: VDD=5 Volts. 6.3 Primetime explored time critical paths for increasing the maximum allowed us frequency. BS encoder represents a time critical circuit, due the presence of a secondary fast clock (used y Patgen), with respect the us cycle. Additionally, the huge numer of multiplexing 11

12 elements introduced a cut-off frequency hard to overcome. This susection summarizes the main results in static timing analysis (STA) y means of Synopsys Primetime Tool. Figure 15 illustrates the used BSE architecture with 2 pipeline stages and employing the Type III coding and decoding functions. As far as the pipeline architecture is concerned, coding function I and II permits pipelines, instead BS-III allows only a maximum one stage pipeline due the presence of B(t-1). Figure 16 represents the time critical path in a BSE translated in 90nm technology lirary from ST Microelectronics. The STA (Tale 6) concluded how the massive multiplexing (Swap Unit) implicates time critical arcs which represent one of the most limitations on an increased operative frequency. For this reason, a possile research could investigate on custom lirary cells for performing swapping operations. Additionally, the module for Hamming distance, which represents the 33% of the considered worst path, requires timing optimizations. 7.0 Conclusions The Bus Switch mechanism represents an interesting us encoding scheme for low-power, operating in a off chip wide system us. The multi processor environment represents the optimal field of application for encoding the data y our proposed approach. The hardware complexity could e reduced with a sustantially unchanged performance, ut increasing the minimal us capacitance for convenient utilization. This last issue permits a wide use for the proposed solutions in particular for increasing the attery lifetime in a portale device y an appropriate dynamic power management policy. The proposed run time policy permits an interesting activity savings with a data rate aligned with actual low-power us solution [3]. The BS approach might represent the starting point for energy efficient and high data rate multi-processor networks. References [1] L. Benini, A. Bogliolo and G. De Micheli Dynamic Power Management of Electronic Systems in Int. Conf on Computer-Aided Design (1998) [2] L. Benini, A. Macii, E. Macii, M. Poncino and R. Scarsi Architectures and Synthesis Algorithms for Power-Efficient Bus Interfaces IEEE. Trans. on Computer-Aided Design Vol.19, No.9 (2000) [3] K. Lahiri and A. ~Raughunathan Communication Based Power Management in IEEE Design and Test of Computers (2002) [4] J. Lorch and A. Smith Software Strategies for Portale Computer Energy Management in IEEE Personal Communications (1998) [5] Y. Lu and G. De Micheli Comparing System-Level Power Management policies in IEEE Design and Test of Computers (2001) [6] Y. Lu, T. Simunic and G. De Micheli Software Controlled Power Management [7] F. Najm Transition Density: A new measure of Activity in Digital Circuits in IEEE Transaction on Computer-Aided Design Vol.12, No.2 (1993) [8] M. Olivieri, F. Pappalardo and G. Visalli Bus Switch coding for reducing power dissipation in off-chip uses IEEE Transaction on Very Large Scale Integration, Vol 12, No 12 Decemer 2004 [9] D. Panigrahi, C. Chiasserini, S. Dey, R. Rao, A. Raughunathan and K. Lahiri Battery Life Estimation of Moile Emedded Systems in 14th International Conference VLSI Design (2001) [10] R. Siegmund, C. Kretzschmar, and D. Muller Adaptive Bus Encoding Technique for Switching Activity Reduced Data Transfer over Wide System Buses Proceeding of PATMOS (2000) [11] M. R. Stan and W.P. Burleson Bus-Invert coding for low-power I/O IEEE Transaction on VLSI Systems Vol.3 pp49-58 (1995) [12] G. Visalli and F. Pappalardo Process and device for reducing the us switching activity and computer program therefore US Application Patent

13 Tales P p Tale1 examples of pattern and inverse pattern BS- Type Encoding Function Decoding Function ( ) [( ( )" B( t ), p ( t) ] [( ( )" B( t ), p ( t) ] I. B( t) = SW ( ( t), p( t ) ( t) = SW B( t), p ( t) II. B( t) = SW ( ( t), p( t )" SW ( ( t ), p ( t ) ( t) = S w S w ( t ), p ( t) III. B( t) = S ( ( t), p( t )" S B( t ), p ( t) ( t) = S S B( t ), p ( t) W w ( ) Tale 2 different BS types, varying the coding-decoding functions w w Files BS I BS II BS III LaTeX 6.64 % % % Spice 4.30 % % % Gcc 5.90 % % % Jpeg % % % Mp % % % Avi 2.74 % 3.30 % % Tale 3 switching activity reduction of BS, referring to the three coding functions proposed. Bus width=32, M=4 Files BS I BS II BS III LaTeX 7.14 % % % Spice 5.16 % % % Gcc 6.47 % % % Jpeg % % % Mp % % % Avi 4.80 % 3.54 % % Tale 4 switching activity reduction of BS, assuming a reduced reordering pattern set (16 over 24 possile patterns). Bus width=32, M=4 13

14 Benchmark Battery Lifetime (sec) Battery Lifetime Gain % (no RT-DPM) (sec) (RT-BSE) Latex % Spice % Gcc % Jpeg % Mp % Avi % Tale 5 the increased attery lifetime, using a run-time DPM ased on BS approach Module Latency (ns) Percentage to fast clock Swap Unit 0.500ns 10.0% H (Hamming) 1.650ns 33.0% PConv (Pattern Conversion) 0.475ns 9.5 % Threshold Unit 0.340ns 6.8% Patgen 0.440ns 8.8% Network Delay 1.000ns 20.00% Other 0.595ns 11.9% TABLE 6 TIMING REPORT FOR 2-PIPE BS AT 90NM (FAST CLOCK 5 NS) Figures Processor Core Vs. Bus-Clock Clock Rate (MHz) DX2 486DX4 P-100 P-150 P-200 PII-300 PII-400 PII-500 PIII-600 PIII-700 PIII-800 PIV-1G PIV-1.5G Bus-Clock Processor Figure 1 the roadmap for us-speed in the Intel s family processors in the last decade 14

15 Figure 2 the Bus Switch Activity performance (Lines=32 Cluster depth=4) Figure 3 the BS activity savings, varying the us input lines, compared to the known algorithms Figure 4 implementation of the swap operation (M=4). The asic swap ox Figure 5 implementation of the swap operation (M=4). The complete swap unit Figure 6 twin swap unit for the implementation of coding functions II and III Figure 7 architecture of the single unit inside the encoder 15

16 Figure 8 the L-way BS encoder Figure 9 the BS decoder (Type I) Figure 10 the BS decoder (Type II) Figure 11 the BS decoder (Type III) Figure 12 a multi-processor architecture, with BSased communication channels Figure 13 the run-time software-controlled power management system 16

17 4-way 16 patterns 90nm implementation E% LaTeX Spice Gcc Jpeg MP3 AVI Reference Bus capacitance [pf] Figure 14 the minimum us capacitance for convenient BS at 90 nm Figure 15 the 2-pipe stages BS encoder (Type III): 32lines, 4 ways and 16 optimized patterns 17

18 Figure 16 BS critical path at 90nm and 20ns clock cycle, using Primetime 18

Interframe Bus Encoding Technique for Low Power Video Compression

Interframe Bus Encoding Technique for Low Power Video Compression Interframe Bus Encoding Technique for Low Power Video Compression Asral Bahari, Tughrul Arslan and Ahmet T. Erdogan School of Engineering and Electronics, University of Edinburgh United Kingdom Email:

More information

A Novel Bus Encoding Technique for Low Power VLSI

A Novel Bus Encoding Technique for Low Power VLSI A Novel Bus Encoding Technique for Low Power VLSI Jayapreetha Natesan and Damu Radhakrishnan * Department of Electrical and Computer Engineering State University of New York 75 S. Manheim Blvd., New Paltz,

More information

A Genetic Approach To Bus Encoding

A Genetic Approach To Bus Encoding A Genetic Approach To Bus Encoding Giuseppe Ascia Vincenzo Catania Maurizio Palesi Antonio Parlato Dipartimento di Ingegneria Informatica e delle Telecomunicazioni University of Catania, Italy Abstract

More information

Design of Fault Coverage Test Pattern Generator Using LFSR

Design of Fault Coverage Test Pattern Generator Using LFSR Design of Fault Coverage Test Pattern Generator Using LFSR B.Saritha M.Tech Student, Department of ECE, Dhruva Institue of Engineering & Technology. Abstract: A new fault coverage test pattern generator

More information

Low Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Low Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Low Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No. # 29 Minimizing Switched Capacitance-III. (Refer

More information

Figure.1 Clock signal II. SYSTEM ANALYSIS

Figure.1 Clock signal II. SYSTEM ANALYSIS International Journal of Advances in Engineering, 2015, 1(4), 518-522 ISSN: 2394-9260 (printed version); ISSN: 2394-9279 (online version); url:http://www.ijae.in RESEARCH ARTICLE Multi bit Flip-Flop Grouping

More information

Leakage Current Reduction in Sequential Circuits by Modifying the Scan Chains

Leakage Current Reduction in Sequential Circuits by Modifying the Scan Chains eakage Current Reduction in Sequential s by Modifying the Scan Chains Afshin Abdollahi University of Southern California (3) 592-3886 afshin@usc.edu Farzan Fallah Fujitsu aboratories of America (48) 53-4544

More information

Interframe Bus Encoding Technique and Architecture for MPEG-4 AVC/H.264 Video Compression

Interframe Bus Encoding Technique and Architecture for MPEG-4 AVC/H.264 Video Compression Interframe Encoding Technique and Architecture for MPEG-4 AVC/H.264 Video Compression Asral Bahari, Tughrul Arslan and Ahmet T. Erdogan Abstract In this paper, we propose an implementation of a data encoder

More information

Modifying the Scan Chains in Sequential Circuit to Reduce Leakage Current

Modifying the Scan Chains in Sequential Circuit to Reduce Leakage Current IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 3, Issue 1 (Sep. Oct. 2013), PP 01-09 e-issn: 2319 4200, p-issn No. : 2319 4197 Modifying the Scan Chains in Sequential Circuit to Reduce Leakage

More information

Asynchronous IC Interconnect Network Design and Implementation Using a Standard ASIC Flow

Asynchronous IC Interconnect Network Design and Implementation Using a Standard ASIC Flow Asynchronous IC Interconnect Network Design and Implementation Using a Standard ASIC Flow Bradley R. Quinton*, Mark R. Greenstreet, Steven J.E. Wilton*, *Dept. of Electrical and Computer Engineering, Dept.

More information

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath Objectives Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath In the previous chapters we have studied how to develop a specification from a given application, and

More information

A Low Power Delay Buffer Using Gated Driver Tree

A Low Power Delay Buffer Using Gated Driver Tree IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) ISSN: 2319 4200, ISBN No. : 2319 4197 Volume 1, Issue 4 (Nov. - Dec. 2012), PP 26-30 A Low Power Delay Buffer Using Gated Driver Tree Kokkilagadda

More information

Transactions Briefs. Interframe Bus Encoding Technique and Architecture for MPEG-4 AVC/H.264 Video Compression

Transactions Briefs. Interframe Bus Encoding Technique and Architecture for MPEG-4 AVC/H.264 Video Compression IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 18, NO. 5, MAY 2010 831 Transactions Briefs Interframe Bus Encoding Technique and Architecture for MPEG-4 AVC/H.264 Video Compression

More information

Random Access Scan. Veeraraghavan Ramamurthy Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL

Random Access Scan. Veeraraghavan Ramamurthy Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL Random Access Scan Veeraraghavan Ramamurthy Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL ramamve@auburn.edu Term Paper for ELEC 7250 (Spring 2005) Abstract: Random Access

More information

A low-power portable H.264/AVC decoder using elastic pipeline

A low-power portable H.264/AVC decoder using elastic pipeline Chapter 3 A low-power portable H.64/AVC decoder using elastic pipeline Yoshinori Sakata, Kentaro Kawakami, Hiroshi Kawaguchi, Masahiko Graduate School, Kobe University, Kobe, Hyogo, 657-8507 Japan Email:

More information

Power Optimization by Using Multi-Bit Flip-Flops

Power Optimization by Using Multi-Bit Flip-Flops Volume-4, Issue-5, October-2014, ISSN No.: 2250-0758 International Journal of Engineering and Management Research Page Number: 194-198 Power Optimization by Using Multi-Bit Flip-Flops D. Hazinayab 1, K.

More information

Peak Dynamic Power Estimation of FPGA-mapped Digital Designs

Peak Dynamic Power Estimation of FPGA-mapped Digital Designs Peak Dynamic Power Estimation of FPGA-mapped Digital Designs Abstract The Peak Dynamic Power Estimation (P DP E) problem involves finding input vector pairs that cause maximum power dissipation (maximum

More information

Low Power Digital Design using Asynchronous Logic

Low Power Digital Design using Asynchronous Logic San Jose State University SJSU ScholarWorks Master's Theses Master's Theses and Graduate Research Spring 2011 Low Power Digital Design using Asynchronous Logic Sathish Vimalraj Antony Jayasekar San Jose

More information

Retiming Sequential Circuits for Low Power

Retiming Sequential Circuits for Low Power Retiming Sequential Circuits for Low Power José Monteiro, Srinivas Devadas Department of EECS MIT, Cambridge, MA Abhijit Ghosh Mitsubishi Electric Research Laboratories Sunnyvale, CA Abstract Switching

More information

Sharif University of Technology. SoC: Introduction

Sharif University of Technology. SoC: Introduction SoC Design Lecture 1: Introduction Shaahin Hessabi Department of Computer Engineering System-on-Chip System: a set of related parts that act as a whole to achieve a given goal. A system is a set of interacting

More information

Power Reduction Techniques for a Spread Spectrum Based Correlator

Power Reduction Techniques for a Spread Spectrum Based Correlator Power Reduction Techniques for a Spread Spectrum Based Correlator David Garrett (garrett@virginia.edu) and Mircea Stan (mircea@virginia.edu) Center for Semicustom Integrated Systems University of Virginia

More information

REDUCING DYNAMIC POWER BY PULSED LATCH AND MULTIPLE PULSE GENERATOR IN CLOCKTREE

REDUCING DYNAMIC POWER BY PULSED LATCH AND MULTIPLE PULSE GENERATOR IN CLOCKTREE Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 5, May 2014, pg.210

More information

Combining Dual-Supply, Dual-Threshold and Transistor Sizing for Power Reduction

Combining Dual-Supply, Dual-Threshold and Transistor Sizing for Power Reduction Combining Dual-Supply, Dual-Threshold and Transistor Sizing for Reduction Stephanie Augsburger 1, Borivoje Nikolić 2 1 Intel Corporation, Enterprise Processors Division, Santa Clara, CA, USA. 2 Department

More information

data and is used in digital networks and storage devices. CRC s are easy to implement in binary

data and is used in digital networks and storage devices. CRC s are easy to implement in binary Introduction Cyclic redundancy check (CRC) is an error detecting code designed to detect changes in transmitted data and is used in digital networks and storage devices. CRC s are easy to implement in

More information

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS)

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS) International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Journal of Emerging Technologies in Computational

More information

VLSI Design: 3) Explain the various MOSFET Capacitances & their significance. 4) Draw a CMOS Inverter. Explain its transfer characteristics

VLSI Design: 3) Explain the various MOSFET Capacitances & their significance. 4) Draw a CMOS Inverter. Explain its transfer characteristics 1) Explain why & how a MOSFET works VLSI Design: 2) Draw Vds-Ids curve for a MOSFET. Now, show how this curve changes (a) with increasing Vgs (b) with increasing transistor width (c) considering Channel

More information

Research Article Low Power 256-bit Modified Carry Select Adder

Research Article Low Power 256-bit Modified Carry Select Adder Research Journal of Applied Sciences, Engineering and Technology 8(10): 1212-1216, 2014 DOI:10.19026/rjaset.8.1086 ISSN: 2040-7459; e-issn: 2040-7467 2014 Maxwell Scientific Publication Corp. Submitted:

More information

Optimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015

Optimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015 Optimization of Multi-Channel BCH Error Decoding for Common Cases Russell Dill Master's Thesis Defense April 20, 2015 Bose-Chaudhuri-Hocquenghem (BCH) BCH is an Error Correcting Code (ECC) and is used

More information

Implementation of Low Power and Area Efficient Carry Select Adder

Implementation of Low Power and Area Efficient Carry Select Adder International Journal of Engineering Science Invention ISSN (Online): 2319 6734, ISSN (Print): 2319 6726 Volume 3 Issue 8 ǁ August 2014 ǁ PP.36-48 Implementation of Low Power and Area Efficient Carry Select

More information

Performance Evolution of 16 Bit Processor in FPGA using State Encoding Techniques

Performance Evolution of 16 Bit Processor in FPGA using State Encoding Techniques Performance Evolution of 16 Bit Processor in FPGA using State Encoding Techniques Madhavi Anupoju 1, M. Sunil Prakash 2 1 M.Tech (VLSI) Student, Department of Electronics & Communication Engineering, MVGR

More information

POWER OPTIMIZED CLOCK GATED ALU FOR LOW POWER PROCESSOR DESIGN

POWER OPTIMIZED CLOCK GATED ALU FOR LOW POWER PROCESSOR DESIGN POWER OPTIMIZED CLOCK GATED ALU FOR LOW POWER PROCESSOR DESIGN 1 L.RAJA, 2 Dr.K.THANUSHKODI 1 Prof., Department of Electronics and Communication Engineeering, Angel College of Engineering and Technology,

More information

Performance Driven Reliable Link Design for Network on Chips

Performance Driven Reliable Link Design for Network on Chips Performance Driven Reliable Link Design for Network on Chips Rutuparna Tamhankar Srinivasan Murali Prof. Giovanni De Micheli Stanford University Outline Introduction Objective Logic design and implementation

More information

Hardware Implementation of Viterbi Decoder for Wireless Applications

Hardware Implementation of Viterbi Decoder for Wireless Applications Hardware Implementation of Viterbi Decoder for Wireless Applications Bhupendra Singh 1, Sanjeev Agarwal 2 and Tarun Varma 3 Deptt. of Electronics and Communication Engineering, 1 Amity School of Engineering

More information

Design and Analysis of Modified Fast Compressors for MAC Unit

Design and Analysis of Modified Fast Compressors for MAC Unit Design and Analysis of Modified Fast Compressors for MAC Unit Anusree T U 1, Bonifus P L 2 1 PG Student & Dept. of ECE & Rajagiri School of Engineering & Technology 2 Assistant Professor & Dept. of ECE

More information

A Novel Approach for Auto Clock Gating of Flip-Flops

A Novel Approach for Auto Clock Gating of Flip-Flops A Novel Approach for Auto Clock Gating of Flip-Flops Kakarla Sandhya Rani 1, Krishna Prasad Satamraju 2 1 P.G Scholar, Department of ECE, Vasireddy Venkatadri Institute of Technology, Nambur, Guntur (dt),

More information

An FPGA Implementation of Shift Register Using Pulsed Latches

An FPGA Implementation of Shift Register Using Pulsed Latches An FPGA Implementation of Shift Register Using Pulsed Latches Shiny Panimalar.S, T.Nisha Priscilla, Associate Professor, Department of ECE, MAMCET, Tiruchirappalli, India PG Scholar, Department of ECE,

More information

Static Timing Analysis for Nanometer Designs

Static Timing Analysis for Nanometer Designs J. Bhasker Rakesh Chadha Static Timing Analysis for Nanometer Designs A Practical Approach 4y Spri ringer Contents Preface xv CHAPTER 1: Introduction / 1.1 Nanometer Designs 1 1.2 What is Static Timing

More information

EL302 DIGITAL INTEGRATED CIRCUITS LAB #3 CMOS EDGE TRIGGERED D FLIP-FLOP. Due İLKER KALYONCU, 10043

EL302 DIGITAL INTEGRATED CIRCUITS LAB #3 CMOS EDGE TRIGGERED D FLIP-FLOP. Due İLKER KALYONCU, 10043 EL302 DIGITAL INTEGRATED CIRCUITS LAB #3 CMOS EDGE TRIGGERED D FLIP-FLOP Due 16.05. İLKER KALYONCU, 10043 1. INTRODUCTION: In this project we are going to design a CMOS positive edge triggered master-slave

More information

Prototyping an ASIC with FPGAs. By Rafey Mahmud, FAE at Synplicity.

Prototyping an ASIC with FPGAs. By Rafey Mahmud, FAE at Synplicity. Prototyping an ASIC with FPGAs By Rafey Mahmud, FAE at Synplicity. With increased capacity of FPGAs and readily available off-the-shelf prototyping boards sporting multiple FPGAs, it has become feasible

More information

High Performance Dynamic Hybrid Flip-Flop For Pipeline Stages with Methodical Implanted Logic

High Performance Dynamic Hybrid Flip-Flop For Pipeline Stages with Methodical Implanted Logic High Performance Dynamic Hybrid Flip-Flop For Pipeline Stages with Methodical Implanted Logic K.Vajida Tabasum, K.Chandra Shekhar Abstract-In this paper we introduce a new high performance dynamic hybrid

More information

CHAPTER 6 ASYNCHRONOUS QUASI DELAY INSENSITIVE TEMPLATES (QDI) BASED VITERBI DECODER

CHAPTER 6 ASYNCHRONOUS QUASI DELAY INSENSITIVE TEMPLATES (QDI) BASED VITERBI DECODER 80 CHAPTER 6 ASYNCHRONOUS QUASI DELAY INSENSITIVE TEMPLATES (QDI) BASED VITERBI DECODER 6.1 INTRODUCTION Asynchronous designs are increasingly used to counter the disadvantages of synchronous designs.

More information

International Journal of Scientific & Engineering Research, Volume 5, Issue 9, September ISSN

International Journal of Scientific & Engineering Research, Volume 5, Issue 9, September ISSN International Journal of Scientific & Engineering Research, Volume 5, Issue 9, September-2014 917 The Power Optimization of Linear Feedback Shift Register Using Fault Coverage Circuits K.YARRAYYA1, K CHITAMBARA

More information

A Power Efficient Flip Flop by using 90nm Technology

A Power Efficient Flip Flop by using 90nm Technology A Power Efficient Flip Flop by using 90nm Technology Mrs. Y. Lavanya Associate Professor, ECE Department, Ramachandra College of Engineering, Eluru, W.G (Dt.), A.P, India. Email: lavanya.rcee@gmail.com

More information

A Symmetric Differential Clock Generator for Bit-Serial Hardware

A Symmetric Differential Clock Generator for Bit-Serial Hardware A Symmetric Differential Clock Generator for Bit-Serial Hardware Mitchell J. Myjak and José G. Delgado-Frias School of Electrical Engineering and Computer Science Washington State University Pullman, WA,

More information

Timing Error Detection: An Adaptive Scheme To Combat Variability EE241 Final Report Nathan Narevsky and Richard Ott {nnarevsky,

Timing Error Detection: An Adaptive Scheme To Combat Variability EE241 Final Report Nathan Narevsky and Richard Ott {nnarevsky, Timing Error Detection: An Adaptive Scheme To Combat Variability EE241 Final Report Nathan Narevsky and Richard Ott {nnarevsky, tomott}@berkeley.edu Abstract With the reduction of feature sizes, more sources

More information

University College of Engineering, JNTUK, Kakinada, India Member of Technical Staff, Seerakademi, Hyderabad

University College of Engineering, JNTUK, Kakinada, India Member of Technical Staff, Seerakademi, Hyderabad Power Analysis of Sequential Circuits Using Multi- Bit Flip Flops Yarramsetti Ramya Lakshmi 1, Dr. I. Santi Prabha 2, R.Niranjan 3 1 M.Tech, 2 Professor, Dept. of E.C.E. University College of Engineering,

More information

Fully Static and Compressed Topology Using Power Saving in Digital circuits for Reduced Transistor Flip flop

Fully Static and Compressed Topology Using Power Saving in Digital circuits for Reduced Transistor Flip flop Fully Static and Compressed Topology Using Power Saving in Digital circuits for Reduced Transistor Flip flop 1 S.Mounika & 2 P.Dhaneef Kumar 1 M.Tech, VLSIES, GVIC college, Madanapalli, mounikarani3333@gmail.com

More information

DIFFERENTIAL CONDITIONAL CAPTURING FLIP-FLOP TECHNIQUE USED FOR LOW POWER CONSUMPTION IN CLOCKING SCHEME

DIFFERENTIAL CONDITIONAL CAPTURING FLIP-FLOP TECHNIQUE USED FOR LOW POWER CONSUMPTION IN CLOCKING SCHEME DIFFERENTIAL CONDITIONAL CAPTURING FLIP-FLOP TECHNIQUE USED FOR LOW POWER CONSUMPTION IN CLOCKING SCHEME Mr.N.Vetriselvan, Assistant Professor, Dhirajlal Gandhi College of Technology Mr.P.N.Palanisamy,

More information

Low Power Illinois Scan Architecture for Simultaneous Power and Test Data Volume Reduction

Low Power Illinois Scan Architecture for Simultaneous Power and Test Data Volume Reduction Low Illinois Scan Architecture for Simultaneous and Test Data Volume Anshuman Chandra, Felix Ng and Rohit Kapur Synopsys, Inc., 7 E. Middlefield Rd., Mountain View, CA Abstract We present Low Illinois

More information

Designing for High Speed-Performance in CPLDs and FPGAs

Designing for High Speed-Performance in CPLDs and FPGAs Designing for High Speed-Performance in CPLDs and FPGAs Zeljko Zilic, Guy Lemieux, Kelvin Loveless, Stephen Brown, and Zvonko Vranesic Department of Electrical and Computer Engineering University of Toronto,

More information

Power Efficient Design of Sequential Circuits using OBSC and RTPG Integration

Power Efficient Design of Sequential Circuits using OBSC and RTPG Integration Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 2, Issue. 9, September 2013,

More information

Fault Detection And Correction Using MLD For Memory Applications

Fault Detection And Correction Using MLD For Memory Applications Fault Detection And Correction Using MLD For Memory Applications Jayasanthi Sambbandam & G. Jose ECE Dept. Easwari Engineering College, Ramapuram E-mail : shanthisindia@yahoo.com & josejeyamani@gmail.com

More information

Optimizing area of local routing network by reconfiguring look up tables (LUTs)

Optimizing area of local routing network by reconfiguring look up tables (LUTs) Vol.2, Issue.3, May-June 2012 pp-816-823 ISSN: 2249-6645 Optimizing area of local routing network by reconfiguring look up tables (LUTs) Sathyabhama.B 1 and S.Sudha 2 1 M.E-VLSI Design 2 Dept of ECE Easwari

More information

LOW POWER AND HIGH PERFORMANCE SHIFT REGISTERS USING PULSED LATCH TECHNIQUE

LOW POWER AND HIGH PERFORMANCE SHIFT REGISTERS USING PULSED LATCH TECHNIQUE OI: 10.21917/ijme.2018.0088 LOW POWER AN HIGH PERFORMANCE SHIFT REGISTERS USING PULSE LATCH TECHNIUE Vandana Niranjan epartment of Electronics and Communication Engineering, Indira Gandhi elhi Technical

More information

SYNCHRONOUS DERIVED CLOCK AND SYNTHESIS OF LOW POWER SEQUENTIAL CIRCUITS *

SYNCHRONOUS DERIVED CLOCK AND SYNTHESIS OF LOW POWER SEQUENTIAL CIRCUITS * SYNCHRONOUS DERIVED CLOCK AND SYNTHESIS OF LOW POWER SEUENTIAL CIRCUITS * Wu Xunwei (Department of Electronic Engineering Hangzhou University Hangzhou 328) ing Wu Massoud Pedram (Department of Electrical

More information

Partial Bus Specific Clock Gating With DPL Based DDFF Design

Partial Bus Specific Clock Gating With DPL Based DDFF Design International Journal of Inventions in Computer Science and Engineering, Volume 2 Issue 4 April 2015 Partial Bus Specific Clock Gating With DPL Based DDFF Design For Low Power Application Reshmachandran

More information

DC Ultra. Concurrent Timing, Area, Power and Test Optimization. Overview

DC Ultra. Concurrent Timing, Area, Power and Test Optimization. Overview DATASHEET DC Ultra Concurrent Timing, Area, Power and Test Optimization DC Ultra RTL synthesis solution enables users to meet today s design challenges with concurrent optimization of timing, area, power

More information

Design of Polar List Decoder using 2-Bit SC Decoding Algorithm V Priya 1 M Parimaladevi 2

Design of Polar List Decoder using 2-Bit SC Decoding Algorithm V Priya 1 M Parimaladevi 2 IJSRD - International Journal for Scientific Research & Development Vol. 3, Issue 03, 2015 ISSN (online): 2321-0613 V Priya 1 M Parimaladevi 2 1 Master of Engineering 2 Assistant Professor 1,2 Department

More information

DESIGN OF DOUBLE PULSE TRIGGERED FLIP-FLOP BASED ON SIGNAL FEED THROUGH SCHEME

DESIGN OF DOUBLE PULSE TRIGGERED FLIP-FLOP BASED ON SIGNAL FEED THROUGH SCHEME Scientific Journal Impact Factor (SJIF): 1.711 e-issn: 2349-9745 p-issn: 2393-8161 International Journal of Modern Trends in Engineering and Research www.ijmter.com DESIGN OF DOUBLE PULSE TRIGGERED FLIP-FLOP

More information

At-speed Testing of SOC ICs

At-speed Testing of SOC ICs At-speed Testing of SOC ICs Vlado Vorisek, Thomas Koch, Hermann Fischer Multimedia Design Center, Semiconductor Products Sector Motorola Munich, Germany Abstract This paper discusses the aspects and associated

More information

Abstract 1. INTRODUCTION. Cheekati Sirisha, IJECS Volume 05 Issue 10 Oct., 2016 Page No Page 18532

Abstract 1. INTRODUCTION. Cheekati Sirisha, IJECS Volume 05 Issue 10 Oct., 2016 Page No Page 18532 www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 5 Issue 10 Oct. 2016, Page No. 18532-18540 Pulsed Latches Methodology to Attain Reduced Power and Area Based

More information

Dual-V DD and Input Reordering for Reduced Delay and Subthreshold Leakage in Pass Transistor Logic

Dual-V DD and Input Reordering for Reduced Delay and Subthreshold Leakage in Pass Transistor Logic Dual-V DD and Input Reordering for Reduced Delay and Subthreshold Leakage in Pass Transistor Logic Jeff Brantley and Sam Ridenour ECE 6332 Fall 21 University of Virginia @virginia.edu ABSTRACT

More information

An Efficient High Speed Wallace Tree Multiplier

An Efficient High Speed Wallace Tree Multiplier Chepuri satish,panem charan Arur,G.Kishore Kumar and G.Mamatha 38 An Efficient High Speed Wallace Tree Multiplier Chepuri satish, Panem charan Arur, G.Kishore Kumar and G.Mamatha Abstract: The Wallace

More information

Area Efficient Pulsed Clock Generator Using Pulsed Latch Shift Register

Area Efficient Pulsed Clock Generator Using Pulsed Latch Shift Register International Journal for Modern Trends in Science and Technology Volume: 02, Issue No: 10, October 2016 http://www.ijmtst.com ISSN: 2455-3778 Area Efficient Pulsed Clock Generator Using Pulsed Latch Shift

More information

LFSR Counter Implementation in CMOS VLSI

LFSR Counter Implementation in CMOS VLSI LFSR Counter Implementation in CMOS VLSI Doshi N. A., Dhobale S. B., and Kakade S. R. Abstract As chip manufacturing technology is suddenly on the threshold of major evaluation, which shrinks chip in size

More information

Using on-chip Test Pattern Compression for Full Scan SoC Designs

Using on-chip Test Pattern Compression for Full Scan SoC Designs Using on-chip Test Pattern Compression for Full Scan SoC Designs Helmut Lang Senior Staff Engineer Jens Pfeiffer CAD Engineer Jeff Maguire Principal Staff Engineer Motorola SPS, System-on-a-Chip Design

More information

DESIGN AND SIMULATION OF A CIRCUIT TO PREDICT AND COMPENSATE PERFORMANCE VARIABILITY IN SUBMICRON CIRCUIT

DESIGN AND SIMULATION OF A CIRCUIT TO PREDICT AND COMPENSATE PERFORMANCE VARIABILITY IN SUBMICRON CIRCUIT DESIGN AND SIMULATION OF A CIRCUIT TO PREDICT AND COMPENSATE PERFORMANCE VARIABILITY IN SUBMICRON CIRCUIT Sripriya. B.R, Student of M.tech, Dept of ECE, SJB Institute of Technology, Bangalore Dr. Nataraj.

More information

Clock Tree Power Optimization of Three Dimensional VLSI System with Network

Clock Tree Power Optimization of Three Dimensional VLSI System with Network Clock Tree Power Optimization of Three Dimensional VLSI System with Network M.Saranya 1, S.Mahalakshmi 2, P.Saranya Devi 3 PG Student, Dept. of ECE, Syed Ammal Engineering College, Ramanathapuram, Tamilnadu,

More information

Digital Integrated Circuits EECS 312. Review. Remember the ENIAC? IC ENIAC. Trend for one company. First microprocessor

Digital Integrated Circuits EECS 312. Review. Remember the ENIAC? IC ENIAC. Trend for one company. First microprocessor 14 12 10 8 6 IBM ES9000 Bipolar Fujitsu VP2000 IBM 3090S Pulsar 4 IBM 3090 IBM RY6 CDC Cyber 205 IBM 4381 IBM RY4 2 IBM 3081 Apache Fujitsu M380 IBM 370 Merced IBM 360 IBM 3033 Vacuum Pentium II(DSIP)

More information

Chapter 10 Exercise Solutions

Chapter 10 Exercise Solutions VLSI Test Principles and Architectures Ch. 10 oundary Scan & Core-ased Testing P. 1/10 Chapter 10 Exercise Solutions 10.1 The following is just an example for testing chips and interconnects on a board.

More information

Design of Test Circuits for Maximum Fault Coverage by Using Different Techniques

Design of Test Circuits for Maximum Fault Coverage by Using Different Techniques Design of Test Circuits for Maximum Fault Coverage by Using Different Techniques Akkala Suvarna Ratna M.Tech (VLSI & ES), Department of ECE, Sri Vani School of Engineering, Vijayawada. Abstract: A new

More information

Digital Phase Adjustment Scheme 0 6/3/98, Chaney. A Digital Phase Adjustment Circuit for ATM and ATM- like Data Formats. by Thomas J.

Digital Phase Adjustment Scheme 0 6/3/98, Chaney. A Digital Phase Adjustment Circuit for ATM and ATM- like Data Formats. by Thomas J. igital Phase Adjustment Scheme 6/3/98, haney A igital Phase Adjustment ircuit for ATM and ATM- like ata Formats by Thomas J. haney epartment of omputer Science University St. Louis, Missouri 633 tom@arl.wustl.edu

More information

V6118 EM MICROELECTRONIC - MARIN SA. 2, 4 and 8 Mutiplex LCD Driver

V6118 EM MICROELECTRONIC - MARIN SA. 2, 4 and 8 Mutiplex LCD Driver EM MICROELECTRONIC - MARIN SA 2, 4 and 8 Mutiplex LCD Driver Description The is a universal low multiplex LCD driver. The version 2 drives two ways multiplex (two blackplanes) LCD, the version 4, four

More information

Slack Redistribution for Graceful Degradation Under Voltage Overscaling

Slack Redistribution for Graceful Degradation Under Voltage Overscaling Slack Redistribution for Graceful Degradation Under Voltage Overscaling Andrew B. Kahng, Seokhyeong Kang, Rakesh Kumar and John Sartori VLSI CAD LABORATORY, UCSD PASSAT GROUP, UIUC UCSD VLSI CAD Laboratory

More information

nmos transistor Basics of VLSI Design and Test Solution: CMOS pmos transistor CMOS Inverter First-Order DC Analysis CMOS Inverter: Transient Response

nmos transistor Basics of VLSI Design and Test Solution: CMOS pmos transistor CMOS Inverter First-Order DC Analysis CMOS Inverter: Transient Response nmos transistor asics of VLSI Design and Test If the gate is high, the switch is on If the gate is low, the switch is off Mohammad Tehranipoor Drain ECE495/695: Introduction to Hardware Security & Trust

More information

Digital Integrated Circuits EECS 312

Digital Integrated Circuits EECS 312 14 12 10 8 6 Fujitsu VP2000 IBM 3090S Pulsar 4 IBM 3090 IBM RY6 CDC Cyber 205 IBM 4381 IBM RY4 2 IBM 3081 Apache Fujitsu M380 IBM 370 Merced IBM 360 IBM 3033 Vacuum Pentium II(DSIP) 0 1950 1960 1970 1980

More information

CMOS Technology for Increasing Efficiency of Clock Gating Techniques Using Tri-State Buffer

CMOS Technology for Increasing Efficiency of Clock Gating Techniques Using Tri-State Buffer Engineering and Physical Sciences CMOS Technology for Increasing Efficiency of Clock Gating Techniques Using Tri-State Buffer Maan HAMEED *, Asem KHMAG, Fakhrul ZAMAN and Abdurrahman RAMLI Department of

More information

Integrated Circuit Design ELCT 701 (Winter 2017) Lecture 1: Introduction

Integrated Circuit Design ELCT 701 (Winter 2017) Lecture 1: Introduction 1 Integrated Circuit Design ELCT 701 (Winter 2017) Lecture 1: Introduction Assistant Professor Office: C3.315 E-mail: eman.azab@guc.edu.eg 2 Course Overview Lecturer Teaching Assistant Course Team E-mail:

More information

2.6 Reset Design Strategy

2.6 Reset Design Strategy 2.6 Reset esign Strategy Many design issues must be considered before choosing a reset strategy for an ASIC design, such as whether to use synchronous or asynchronous resets, will every flipflop receive

More information

ISSN:

ISSN: 191 Low Power Test Pattern Generator Using LFSR and Single Input Changing Generator (SICG) for BIST Applications A K MOHANTY 1, B P SAHU 2, S S MAHATO 3 Department of Electronics and Communication Engineering,

More information

LUT OPTIMIZATION USING COMBINED APC-OMS TECHNIQUE

LUT OPTIMIZATION USING COMBINED APC-OMS TECHNIQUE LUT OPTIMIZATION USING COMBINED APC-OMS TECHNIQUE S.Basi Reddy* 1, K.Sreenivasa Rao 2 1 M.Tech Student, VLSI System Design, Annamacharya Institute of Technology & Sciences (Autonomous), Rajampet (A.P),

More information

PERFORMANCE ANALYSIS OF AN EFFICIENT PULSE-TRIGGERED FLIP FLOPS FOR ULTRA LOW POWER APPLICATIONS

PERFORMANCE ANALYSIS OF AN EFFICIENT PULSE-TRIGGERED FLIP FLOPS FOR ULTRA LOW POWER APPLICATIONS Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,

More information

Design and Evaluation of a Low-Power UART-Protocol Deserializer

Design and Evaluation of a Low-Power UART-Protocol Deserializer 1 Design and Evaluation of a Low-Power UART-Protocol Deserializer Casey T. Morrison, William Goh, Saeed Sadrameli, and Eric Blattler Abstract The and evaluation of a low-power Universal Asynchronous Receiver/Transmitter

More information

Low Power Approach of Clock Gating in Synchronous System like FIFO: A Novel Clock Gating Approach and Comparative Analysis

Low Power Approach of Clock Gating in Synchronous System like FIFO: A Novel Clock Gating Approach and Comparative Analysis Low Power Approach of Clock Gating in Synchronous System like FIFO: A Novel Clock Gating Approach and Comparative Analysis Abstract- A new technique of clock is presented to reduce dynamic power consumption.

More information

P.Akila 1. P a g e 60

P.Akila 1. P a g e 60 Designing Clock System Using Power Optimization Techniques in Flipflop P.Akila 1 Assistant Professor-I 2 Department of Electronics and Communication Engineering PSR Rengasamy college of engineering for

More information

This paper is a preprint of a paper accepted by Electronics Letters and is subject to Institution of Engineering and Technology Copyright.

This paper is a preprint of a paper accepted by Electronics Letters and is subject to Institution of Engineering and Technology Copyright. This paper is a preprint of a paper accepted by Electronics Letters and is subject to Institution of Engineering and Technology Copyright. The final version is published and available at IET Digital Library

More information

DESIGN AND ANALYSIS OF COMBINATIONAL CODING CIRCUITS USING ADIABATIC LOGIC

DESIGN AND ANALYSIS OF COMBINATIONAL CODING CIRCUITS USING ADIABATIC LOGIC DESIGN AND ANALYSIS OF COMBINATIONAL CODING CIRCUITS USING ADIABATIC LOGIC ARCHITA SRIVASTAVA Integrated B.tech(ECE) M.tech(VLSI) Scholar, Jayoti Vidyapeeth Women s University, Rajasthan, India, Email:

More information

ANALYSIS OF POWER REDUCTION IN 2 TO 4 LINE DECODER DESIGN USING GATE DIFFUSION INPUT TECHNIQUE

ANALYSIS OF POWER REDUCTION IN 2 TO 4 LINE DECODER DESIGN USING GATE DIFFUSION INPUT TECHNIQUE ANALYSIS OF POWER REDUCTION IN 2 TO 4 LINE DECODER DESIGN USING GATE DIFFUSION INPUT TECHNIQUE *Pranshu Sharma, **Anjali Sharma * Assistant Professor, Department of ECE AP Goyal Shimla University, Shimla,

More information

THE USE OF forward error correction (FEC) in optical networks

THE USE OF forward error correction (FEC) in optical networks IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 52, NO. 8, AUGUST 2005 461 A High-Speed Low-Complexity Reed Solomon Decoder for Optical Communications Hanho Lee, Member, IEEE Abstract

More information

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 23, NO. 2, FEBRUARY

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 23, NO. 2, FEBRUARY IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 23, NO. 2, FEBRUARY 2015 317 Scan Test of Die Logic in 3-D ICs Using TSV Probing Brandon Noia, Shreepad Panth, Krishnendu Chakrabarty,

More information

Efficient Architecture for Flexible Prescaler Using Multimodulo Prescaler

Efficient Architecture for Flexible Prescaler Using Multimodulo Prescaler Efficient Architecture for Flexible Using Multimodulo G SWETHA, S YUVARAJ Abstract This paper, An Efficient Architecture for Flexible Using Multimodulo is an architecture which is designed from the proposed

More information

Novel Low Power and Low Transistor Count Flip-Flop Design with. High Performance

Novel Low Power and Low Transistor Count Flip-Flop Design with. High Performance Novel Low Power and Low Transistor Count Flip-Flop Design with High Performance Imran Ahmed Khan*, Dr. Mirza Tariq Beg Department of Electronics and Communication, Jamia Millia Islamia, New Delhi, India

More information

AN EFFICIENT LOW POWER DESIGN FOR ASYNCHRONOUS DATA SAMPLING IN DOUBLE EDGE TRIGGERED FLIP-FLOPS

AN EFFICIENT LOW POWER DESIGN FOR ASYNCHRONOUS DATA SAMPLING IN DOUBLE EDGE TRIGGERED FLIP-FLOPS AN EFFICIENT LOW POWER DESIGN FOR ASYNCHRONOUS DATA SAMPLING IN DOUBLE EDGE TRIGGERED FLIP-FLOPS NINU ABRAHAM 1, VINOJ P.G 2 1 P.G Student [VLSI & ES], SCMS School of Engineering & Technology, Cochin,

More information

Power-Optimal Pipelining in Deep Submicron Technology

Power-Optimal Pipelining in Deep Submicron Technology ISLPED 2004 8/10/2004 -Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and Krste Asanovi Computer Architecture Group, MIT CSAIL Traditional Pipelining Goal: Maximum performance Vdd Clk-Q Setup

More information

A Fast Constant Coefficient Multiplier for the XC6200

A Fast Constant Coefficient Multiplier for the XC6200 A Fast Constant Coefficient Multiplier for the XC6200 Tom Kean, Bernie New and Bob Slous Xilinx Inc. Abstract. We discuss the design of a high performance constant coefficient multiplier on the Xilinx

More information

140 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 12, NO. 2, FEBRUARY 2004

140 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 12, NO. 2, FEBRUARY 2004 140 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 12, NO. 2, FEBRUARY 2004 Leakage Current Reduction in CMOS VLSI Circuits by Input Vector Control Afshin Abdollahi, Farzan Fallah,

More information

12-bit Wallace Tree Multiplier CMPEN 411 Final Report Matthew Poremba 5/1/2009

12-bit Wallace Tree Multiplier CMPEN 411 Final Report Matthew Poremba 5/1/2009 12-bit Wallace Tree Multiplier CMPEN 411 Final Report Matthew Poremba 5/1/2009 Project Overview This project was originally titled Fast Fourier Transform Unit, but due to space and time constraints, the

More information

CMOS Design Analysis of 4 Bit Shifters 1 Baljot Kaur, M.E Scholar, Department of Electronics & Communication Engineering, National

CMOS Design Analysis of 4 Bit Shifters 1 Baljot Kaur, M.E Scholar, Department of Electronics & Communication Engineering, National CMOS Design Analysis of 4 Bit Shifters 1 Baljot Kaur, M.E Scholar, Department of Electronics & Communication Engineering, National Institute of Technical Teachers Training & Research, Chandigarh, UT, (India),

More information

HW#3 - CSE 237A. 1. A scheduler has three queues; A, B and C. Outgoing link speed is 3 bits/sec

HW#3 - CSE 237A. 1. A scheduler has three queues; A, B and C. Outgoing link speed is 3 bits/sec HW#3 - CSE 237A 1. A scheduler has three queues; A, B and C. Outgoing link speed is 3 bits/sec a. (Assume queue A wants to transmit at 1 bit/sec, and queue B at 2 bits/sec and queue C at 3 bits/sec. What

More information

Investigation of Look-Up Table Based FPGAs Using Various IDCT Architectures

Investigation of Look-Up Table Based FPGAs Using Various IDCT Architectures Investigation of Look-Up Table Based FPGAs Using Various IDCT Architectures Jörn Gause Abstract This paper presents an investigation of Look-Up Table (LUT) based Field Programmable Gate Arrays (FPGAs)

More information