Extended TSPC Structures With Double Input/Output Data Throughput for Gigahertz CMOS Circuit Design

Similar documents
A NOVEL DESIGN OF COUNTER USING TSPC D FLIP-FLOP FOR HIGH PERFORMANCE AND LOW POWER VLSI DESIGN APPLICATIONS USING 45NM CMOS TECHNOLOGY

Design of an Efficient Low Power Multi Modulus Prescaler

High Frequency 32/33 Prescalers Using 2/3 Prescaler Technique

CMOS Low Power, High Speed Dual- Modulus32/33Prescalerin sub-nanometer Technology

Design of a High Frequency Dual Modulus Prescaler using Efficient TSPC Flip Flop using 180nm Technology

Efficient Architecture for Flexible Prescaler Using Multimodulo Prescaler

DESIGN OF A NEW MODIFIED CLOCK GATED SENSE-AMPLIFIER FLIP-FLOP

High speed, Low power N/ (N+1) prescaler using TSPC and E-TSPC: A survey Nemitha B 1, Pradeep Kumar B.P 2

Low Power Area Efficient Parallel Counter Architecture

Design of New Dual Edge Triggered Sense Amplifier Flip-Flop with Low Area and Power Efficient

A Modified Static Contention Free Single Phase Clocked Flip-flop Design for Low Power Applications

A Symmetric Differential Clock Generator for Bit-Serial Hardware

Design of a Low Power and Area Efficient Flip Flop With Embedded Logic Module

HIGH PERFORMANCE AND LOW POWER ASYNCHRONOUS DATA SAMPLING WITH POWER GATED DOUBLE EDGE TRIGGERED FLIP-FLOP

data and is used in digital networks and storage devices. CRC s are easy to implement in binary

Low Power, Noise-Free 4/5 PrescalarUsing Domino Logic

High Speed 8-bit Counters using State Excitation Logic and their Application in Frequency Divider

LFSR Counter Implementation in CMOS VLSI

P.Akila 1. P a g e 60

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)

DESIGN OF DOUBLE PULSE TRIGGERED FLIP-FLOP BASED ON SIGNAL FEED THROUGH SCHEME

IN DIGITAL transmission systems, there are always scramblers

DESIGN AND IMPLEMENTATION OF SYNCHRONOUS 4-BIT UP COUNTER USING 180NM CMOS PROCESS TECHNOLOGY

DESIGN AND ANALYSIS OF LOW POWER STS PULSE TRIGGERED FLIP-FLOP USING 250NM CMOS TECHNOLOGY

PERFORMANCE ANALYSIS OF AN EFFICIENT PULSE-TRIGGERED FLIP FLOPS FOR ULTRA LOW POWER APPLICATIONS

Asynchronous Model of Flip-Flop s and Latches for Low Power Clocking

Low-Power and Area-Efficient Shift Register Using Pulsed Latches

Sequential Logic. References:

Reduction of Clock Power in Sequential Circuits Using Multi-Bit Flip-Flops

AN EFFICIENT LOW POWER DESIGN FOR ASYNCHRONOUS DATA SAMPLING IN DOUBLE EDGE TRIGGERED FLIP-FLOPS

CHAPTER 6 ASYNCHRONOUS QUASI DELAY INSENSITIVE TEMPLATES (QDI) BASED VITERBI DECODER

Abstract 1. INTRODUCTION. Cheekati Sirisha, IJECS Volume 05 Issue 10 Oct., 2016 Page No Page 18532

Chapter 6. Flip-Flops and Simple Flip-Flop Applications

EL302 DIGITAL INTEGRATED CIRCUITS LAB #3 CMOS EDGE TRIGGERED D FLIP-FLOP. Due İLKER KALYONCU, 10043

Chapter 5 Flip-Flops and Related Devices

DUAL EDGE-TRIGGERED D-TYPE FLIP-FLOP WITH LOW POWER CONSUMPTION

Metastability Analysis of Synchronizer

A Low-Power CMOS Flip-Flop for High Performance Processors

Improve Performance of Low-Power Clock Branch Sharing Double-Edge Triggered Flip-Flop

EFFICIENT POWER REDUCTION OF TOPOLOGICALLY COMPRESSED FLIP-FLOP AND GDI BASED FLIP FLOP

Analysis of Digitally Controlled Delay Loop-NAND Gate for Glitch Free Design

High Performance Dynamic Hybrid Flip-Flop For Pipeline Stages with Methodical Implanted Logic

Asynchronous (Ripple) Counters

SYNCHRONOUS DERIVED CLOCK AND SYNTHESIS OF LOW POWER SEQUENTIAL CIRCUITS *

LOW-POWER CLOCK DISTRIBUTION IN EDGE TRIGGERED FLIP-FLOP

Comparative study on low-power high-performance standard-cell flip-flops

GLITCH FREE NAND BASED DCDL IN PHASE LOCKED LOOP APPLICATION

Dual Edge Adaptive Pulse Triggered Flip-Flop for a High Speed and Low Power Applications

LOW POWER AND HIGH PERFORMANCE SHIFT REGISTERS USING PULSED LATCH TECHNIQUE

A Power Efficient Flip Flop by using 90nm Technology

DESIGN AND SIMULATION OF A CIRCUIT TO PREDICT AND COMPENSATE PERFORMANCE VARIABILITY IN SUBMICRON CIRCUIT

Modified Ultra-Low Power NAND Based Multiplexer and Flip-Flop

LOW POWER DOUBLE EDGE PULSE TRIGGERED FLIP FLOP DESIGN

ECEN620: Network Theory Broadband Circuit Design Fall 2014

CHAPTER 6 DESIGN OF HIGH SPEED COUNTER USING PIPELINING

Dual Slope ADC Design from Power, Speed and Area Perspectives

Novel Design of Static Dual-Edge Triggered (DET) Flip-Flops using Multiple C-Elements

Design and Simulation of a Digital CMOS Synchronous 4-bit Up-Counter with Set and Reset

Design and Analysis of Semi-Transparent Flip-Flops for high speed and Low Power Applications in Networks

Design of Pulse Triggered Flip Flop Using Conditional Pulse Enhancement Technique

Scan. This is a sample of the first 15 pages of the Scan chapter.

Design of Low Power D-Flip Flop Using True Single Phase Clock (TSPC)

A Low Power Delay Buffer Using Gated Driver Tree

Reduction of Area and Power of Shift Register Using Pulsed Latches

Figure.1 Clock signal II. SYSTEM ANALYSIS

CPS311 Lecture: Sequential Circuits

An FPGA Implementation of Shift Register Using Pulsed Latches

Solution to Digital Logic )What is the magnitude comparator? Design a logic circuit for 4 bit magnitude comparator and explain it,

AN EFFICIENT DOUBLE EDGE TRIGGERING FLIP FLOP (MDETFF)

FLIP-FLOPS AND RELATED DEVICES

Random Access Scan. Veeraraghavan Ramamurthy Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL

Introduction. NAND Gate Latch. Digital Logic Design 1 FLIP-FLOP. Digital Logic Design 1

Modeling and designing of Sense Amplifier based Flip-Flop using Cadence tool at 45nm

Memory elements. Topics. Memory element terminology. Variations in memory elements. Clock terminology. Memory element parameters. clock.

WINTER 15 EXAMINATION Model Answer

Implementation of New Low Glitch and Low Power dual Edge Triggered Flip-Flops Using Multiple C-Elements

A CHARGE RECYCLING THREE-PHASE DUAL-RAIL PRE-CHARGE LOGIC BASED FLIP-FLOP

Low Power D Flip Flop Using Static Pass Transistor Logic

EEC 116 Fall 2011 Lab #5: Pipelined 32b Adder

PHYSICS 5620 LAB 9 Basic Digital Circuits and Flip-Flops

A NOVEL APPROACH TO ACHIEVE HIGH SPEED LOW-POWER HYBRID FLIP-FLOP

Fully Static and Compressed Topology Using Power Saving in Digital circuits for Reduced Transistor Flip flop

Current Mode Double Edge Triggered Flip Flop with Enable

Design Low-Power and Area-Efficient Shift Register using SSASPL Pulsed Latch

Chapter 2. Digital Circuits

Computer Architecture and Organization

Electrical & Computer Engineering ECE 491. Introduction to VLSI. Report 1

Chapter 5: Synchronous Sequential Logic

SEQUENTIAL LOGIC. Satish Chandra Assistant Professor Department of Physics P P N College, Kanpur

DIFFERENTIAL CONDITIONAL CAPTURING FLIP-FLOP TECHNIQUE USED FOR LOW POWER CONSUMPTION IN CLOCKING SCHEME

PICOSECOND TIMING USING FAST ANALOG SAMPLING

Asynchronous IC Interconnect Network Design and Implementation Using a Standard ASIC Flow

Chapter 5 Synchronous Sequential Logic

Low Power Different Sense Amplifier Based Flip-flop Configurations implemented using GDI Technique

REDUCING DYNAMIC POWER BY PULSED LATCH AND MULTIPLE PULSE GENERATOR IN CLOCKTREE

A FOUR GAIN READOUT INTEGRATED CIRCUIT : FRIC 96_1

Vignana Bharathi Institute of Technology UNIT 4 DLD

Use of Low Power DET Address Pointer Circuit for FIFO Memory Design

Power Efficient Design of Sequential Circuits using OBSC and RTPG Integration

I. INTRODUCTION. Figure 1: Explicit Data Close to Output

Transcription:

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 10, NO. 3, JUNE 2002 301 Extended TSPC Structures With Double Input/Output Data Throughput for Gigahertz CMOS Circuit Design João Navarro, S., Jr., and Wilhelmus A. M. Van Noije Abstract New structures to be applied with the extended truesingle-phase-clock (E-TSPC) CMOS circuit technique, an extension of the traditional true-single-phase-clock (TSPC) [1], [2], are presented. These structures, formed by the connection of proper data paths, allow circuits to handle data with rates that are twice the clock rate. Examples of circuits employing such structures are shortly reported and to illustrate more complex applications, the design of a dual-modulus prescaler (divide by 128/129) in a 0.8 m CMOS process is fully depicted. This prescaler, according to simulations, reaches a maximum 2.19-GHz operation rate at 5 V with a 46 mw power consumption. This new approach is also compared with a previous design (implemented with the E-TSPC technique and attaining a 1.59 GHz operation rate) and with other recently published circuits. Index Terms CMOS, digital high-speed design, dual-modulus prescaler, low power, true-single-phase-clock (TSPC). I. INTRODUCTION FROM the early days of CMOS technology up to the present, several clock policies have been proposed for the implementation of CMOS circuits. The number of clock phases a major clock feature has suffered several changes. The pseudo two-phase logic was one of the earliest techniques proposed [3]; later on, two-phase logic structures were introduced and advanced. The domino technique [4], which successfully associated two-phase circuits and dynamic gates, and the NORA technique [5], an extensive no race approach for two-phase and dynamic circuits, are landmarks of this advance. The first single-phase clock policy was only introduced in the late 1980s, called the true single-phase-clock (TSPC) [6]. Single-phase clock policies offer superior characteristics, since their usage simplifies the clock distribution on the chip and reduces the transistor number. Thus, higher frequencies and simple designs can be achieved. In the 1990s, several new TSPC features were proposed [7], and among them a comprehensive extension of the TSPC [1], the extended true-single-phase-clock CMOS circuit technique (E-TSPC); consisting of composition rules for single-phase circuits using complementary static, dynamic, latch, data precharged [7], and NMOS like blocks (ratioed logic blocks) [1], [2]. The main purpose of this paper is the introduction of new structures in the E-TSPC technique to build circuits handling Manuscript received August 4, 2000. This work was supported in part by the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) and in part by the Fundação de Amparo á Pesquisa do Estado de São Paulo, Brazil. The authors are with the Department of Electronic Systems, EPUSP, University of São Paulo, São Paulo, Brazil (e-mail: navarro@lsi.usp.br; noije@lsi.usp.br). Publisher Item Identifier S 1063-8210(02)03194-3. data with rates that are twice the clock rate. These structures are formed by the connection of certain n and p data-chains, leading to lower-power consumption or higher speed (or both) circuits. Further, the design of a dual-modulus prescaler (divide by 128/129) with the proposed structures in a standard 0.8 m CMOS process (0.7 m effective channel length) is detailed, and the simulation results are compared with a previous E-TSCP implementation and with other recently published prescalers. The prescaler implementation aims to evaluate the potentialities of the proposed new structures. This paper is organized as follows. In Section II, the E-TSPC technique is concisely reviewed, and then, in Section III, the new proposed structures are presented with some configuration examples. In Section IV, some circuit examples are depicted and the prescaler design is analyzed. Results of the prescaler and comparisons are reported in Section V, and the main conclusions are drawn in Section VI. II. THE E-TSPC CIRCUIT TECHNIQUE The allowed blocks in E-TSPC circuits have already been listed above and most of them are well-known blocks. Owing to the nonstandard nomenclature used and the importance of the block, the latch blocks and their N-MOS like versions are shown in Fig. 1. Although these blocks do not execute a true latch function, their presence is indispensable in any data chain for the holding operation. In the latch of Fig. 1, the clocked transistors of the n- and p-latches are placed close to the power rail, as suggested by [8]. Blocks with this configuration can attain a higher speed but suffer from charge-sharing problems. Latch configurations with clocked transistors close to the block output are also admissible. Note, that a new terminology associated with data precharged blocks [1], [2], with terms like pc or nonpc inputs, PH and PL blocks, and n-dp and p-dp blocks, is used in both definition 1 and Table I. Data precharged blocks are blocks where the output precharges are controlled by some of the data signal inputs, the so called pc-inputs, and not by the clock signal. In a PH data precharged block, the precharge is done when all pc-inputs are high; similarly, in a PL block, the precharge is done when all pc-inputs are low. If a PH (PL) block has all of its pc-inputs high (low) whenever the clock is low, thus performing the output precharge, the block is also called a n-dp block; likewise, if a PH (PL) block has all of its pc-inputs high (low) whenever the clock is high, the block is called a p-dp block. In E-TSPC circuits, the block connections should be done according to composition rules. Since the concept of data-chain is fundamental for understanding the rule, the definition of datachain is presented first. 1063-8210/02$17.00 2002 IEEE

302 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 10, NO. 3, JUNE 2002 Fig. 1. The latch blocks of the E-TSPC circuit technique: (a) n-latch. (b) N-MOS like n-latch.(c) p-latch. (d) N-MOS like p-latch. TABLE I CONSTRAINTS CONCERNING THE NUMBER OF INVERSIONS BETWEEN ADJACENT BLOCKS [1], [2]. n.r.: NO RESTRICTIONS EXIST; n.a.: the CONNECTION IS NOT ALLOWED; even: AN EVEN NUMBER OF BLOCKS IS REQUIRED; odd: AN ODD NUMBER OF BLOCKS IS REQUIRED Definition 1: An n-data chain is any noncyclic signal propagation path: 1) containing at least one n-latch, or one n-dynamic, or one n-dp block; 2) starting at a circuit external input, or at the output of a p-latch, or p-dynamic, or p-dp block; when this output is followed by static blocks in the normal data flow, the data chain starts at the output of the last static block; 3) going through static, n-dynamic, n-dp, or n-latch blocks; 4) regardless of the number and order of the blocks defined above; 5) finishing in a circuit external output or in the input of the first p-latch, or p-dynamic, or p-dp block. For the p-data chains, an equivalent definition applies, replacing n with p and vice-versa. When the clock is high, n-data chains are in the evaluation phase; otherwise they are in the holding phase. P-data chains evaluate when the clock is low. In Fig. 2, part of a circuit schematic is depicted with seven complete n-data chains. Some examples are the n-data chain starting at input and going through blocks,,, and ; the n-data chain starting at and going through,,,, and ; and the n-data chain starting at and going through,,, and. Five of the six E-TSPC composition rules [1], [2] can be fused in one general rule that is presented as follows: General Rule for Data Chains: An n (p) data chain must present one of the two following configurations: ) to hold at least two blocks, one dynamic block and one latch block, and an even number of inversions between these blocks; ) to hold at least two latches and an even number of inversions between these blocks. Additionally, adjacent blocks in the same data-chain must keep between them an even or odd number of blocks (inversions) according to Table I constraints (two blocks are called adjacent if only static blocks are placed between them). Note that the three n-data chains listed in Fig. 2 conform with the general rule and also that this rule allows configurations that would be considered at fault if other composition rules of the literature were applied. For example, according to the composition rules presented in [7], the most comprehensive composition rules to TSCP, blocks and should not be interconnected, and blocks and should not be interposed between blocks and. Although the above-described rule is sufficient to ensure that data-precharged gates are precharged, that dynamic gates are not affected by incorrect discharges, and that the output of the data-chain last latch is steady at the end of holding phases, the rule conformation is not necessary to the correct operation of the circuit. In fact, typical TSPC circuits employ the D-flip-flop (D-FF) of Fig. 3 that does not conform to the general rule but operates correctly if proper delays exist. For this reason, an exception rule, comprehending configurations similar to the TSPC D-FF, is added as the sixth rule [1], [2].

NAVARRO AND VAN NOIJE: EXTENDED TSPC STRUCTURES WITH DOUBLE INPUT OUTPUT DATA 303 Fig. 2. Example of n-data chains. The blocks mentioned in the text are named and hatched in the figure. Fig. 3. Two TSPC D-flip-flops connected in series. A circuit example that does not conform to the general rule but usually operates correctly. III. E-TSPC NEW STRUCTURES To understand the new structures that will be discussed later, two characteristics of the data-chain operations should be discerned. The first characteristic is found in data chains, n or p, where dynamic and data-precharged blocks are not present. For these data chains, here called fi-data chains (data chains with fusible input), during evaluation phases, input alterations do not cause undesirable discharges, so the data chain output will yield the correct value. 1 The second characteristic is found on data-chains, n or p, where there is a single latch that is also the last block of the data-chain. In consequence, the data chain must comply with the rule. For these data chains, here called fo-data chains (data chains with fusible output), during the holding phases, the output keeps the result calculated along the previous evaluation phase but is in a high impedance state. Input and output structures handling input data and providing output data with rates twice higher than the clock rate are feasible due to the described characteristics. The input structures are obtained through the connection of the inputs of fi-n and fi-p data chains; as a result, while the clock signal is high, the input data go to the n-data chains, and, while the clock signal is 1 The input of the data chain is handled like a block output. low, the data go to the p-data chains. The output structures are obtained through the connection of the outputs of fo-n and fo-p data chains (in case of more than one n (p) data chain, a unique latch must be the last latch of all n (p) data chains); similarly to the input structures, while the clock is high, the output data come from the n-data chains, otherwise, from the p-data chains. The combination of those structures allows new complex designs working with two data evaluations per clock cycle. Some simple examples are presented in Fig. 4. The input data rate in Fig. 4(a) is twice the clock rate and the rate of the two outputs is equal to the clock rate. In contrast, the rate of the two inputs in Fig. 4(b) is equal to the clock rate and the output rate is doubled. Finally, in Fig. 4(c), both input and output rates are doubled. Also, different state machine configurations can be adopted to fulfill the input and output throughput necessities. In Fig. 5, two examples are shown. The input data rate, the output data rate, and the present state data rate are twice the clock rate in the configuration of Fig. 5(a). In case of input rates equal to the clock rate and doubled output rate, a configuration like the one in Fig. 5(b) can be used. IV. CIRCUIT EXAMPLES The input and output structures explained above can be employed with advantage in designs where high speed is pursued; additionally, since it is possible to tradeoff speed against power consumption, reducing transistor dimensions or power supply values, lower-power consumption can alternatively be reached [9]. We will depict some design examples to illustrate the advantages of the new structures. Several circuits have already been implemented using the combinations presented in Fig. 4. In [2], the proposed 1:8 demultiplexer with byte aligner and the 8:1 multiplexer, both implemented in a 0.8 m CMOS technology, are examples of these designs. In the demultiplexer design, the input data is pushed on two parallel shift paths, one dedicated to the even bits and the

304 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 10, NO. 3, JUNE 2002 Fig. 4. Structures to double the data rate; the cross-hatched blocks have to be used if the output data (a) or the input data (b) are synchronized. Fig. 5. State machine configurations. Fig. 6. (a) Transistor schematic of the the 1:8 demultiplexer. (b) Basic 2:1 multiplexer block. Fig. 7. Schematic of the dual-modulus prescaler (divide-by-128/129). other to the odd bits, in order to detect the A1 framing bytes (11 110 110) [10]. A circuit based on the natural 1:2 demultiplexer of the structure in Fig. 4(a) is used to distribute the input data to the two shift paths, as detailed in Fig. 6(a); the full 1:8 demultiplexer reached a measured maximum 1.38 GB/s operation rate at 4.7 V with 349 mw power consumption. In the multiplexer design, the input data is joined by several simple 2:1 multiplexers, the central block of the circuit. A circuit based on the 2:1 multiplexer of the structure in Fig. 4(b) is used in this task, as detailed in Fig. 6(b), and the full 8:1 multiplexer reached a measured maximum 1.7 GB/s operation rate at 5 V with 87.7 mw [11]. Both designs present a very favorable performance when compared with other implementations. In addition, the use of D-FFs triggered by both clock edges, with structure similar to the one in Fig. 4(c), has already been suggested in the literature [12]. To illustrate the application of state machine configurations, we describe the design of a high speed dual-modulus prescaler (divide by 128/129), using a standard 0.8 m CMOS bulk process (ES2/ATMEL CMOS). Prescalers are employed in frequency synthesis systems and have been frequently used to compare different high speed circuit techniques [13] [15]. In Fig. 7, the schematic diagram of a prescaler is depicted. Two parts can be identified in the diagram: the first part, inside the cross-hatched box, is composed of three D-FFs and two logic gates, and forms a synchronous divide-by-4/5 counter [see the timing diagram in Fig. 8(a)]; the other part, at the bottom of the figure, is composed of five D-FFs and forms an asynchronous divide-by-32 counter. The div32 signal, generated by the asynchronous counter, selects if the divide-by-4/5 counter counts up to 4 ( high) or up to 5 ( low). The fractional division ratio of the prescaler, 128 or 129, is selected according to the signal value.

NAVARRO AND VAN NOIJE: EXTENDED TSPC STRUCTURES WITH DOUBLE INPUT OUTPUT DATA 305 Fig. 8. The timing diagram (a) and the transition diagram (b) of the synchronous divide-by-4/5 counter. Fig. 9. The transition diagram (a) and the timing diagram (b) of the new state machine which executes the synchronous division. In this prescaler, the synchronous counter is the critical part in terms of speed. It may be treated as a state machine with one input, the div32 signal, and one output, the A signal; the transition diagram of this state machine is shown in Fig. 8(b). The states of the machine are codified by signals A, B, and C. Using the configuration in Fig. 5(b), we can build a state machine with the same input output pair; in Fig. 9(a), the transition diagram of such machine is depicted. In this case, the state machine clock rate is half the original clock rate. To generate the counter output, the signal which will feed the asynchronous counter, the output of a fo-n data chain and the output of a fo-p data chain, respectively conveying signal A and signal B, are fused. As a result, the counter output carries the A value when the state machine clock is high and the B value when the state machine clock is low. In Fig. 9(b), the timing diagram of the new divide-by-4/5 counter is shown. Note that when the machine is executing the divide-by-4 operation, two cases are expected: the machine moving back and forth between 000 and 110 states or between 100 and 010 states. In Fig. 10, the transistor schematic of the new approach of the divide-by-4/5 counter is depicted with the transistor dimensions in m. The three cross-hatched boxes mark the positive edgetriggered D-FFs; the fusion of signals A and B is done through the data chains sketched in the upper portion of the figure. Note that small dimension transistors are applied in the design. V. RESULTS A full prescaler circuit layout was formed with the divide-by-4/5 counter considered above. Conventional positive edge-triggered TSPC D-FFs (Fig. 3) are used for all the flip-flops of the asynchronous counter except one: the flip-flop clocked directly by the synchronous counter. For this, the conventional positive edge-triggered D-FF was slightly modified to reach higher speed (an N-MOS like p-latch block is used as the first block of the flip-flop). The division of the clock signal to create the clk/2 signal (Fig. 10) is performed by a negative edge-triggered D-FF with a modified configuration [13] for speed optimization. The new prescaler performance was evaluated through SPICE simulations (level two typical parameters, at room temperature) of the netlist extracted from the layout. The simulation results are compared with results of the prescaler described in[16] which has the following characteristics: it was designed with the E-TSPC technique; the process used is the same ES2/ATMEL CMOS process of this work; small transistor sizes were also adopted.

306 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 10, NO. 3, JUNE 2002 Fig. 10. Transistor schematic of the new synchronous divide-by-4/5 counter. The transistor widths, in m, are indicated in the figure; the transistor lengths are 0.8 m. The signal clk/2 is the input clock divided by 2. In Fig. 11, the simulated maximum input frequency results of the new prescaler and both the simulated and the measured maximum input frequency results of the prescaler in [16] are shown. For both the measurements and the simulations, the maximum input clock excursion is 3 V, since the pulse generator employed in the measurements had a 3 V maximum excursion. The graphic presents the significant gain in speed, over 50%, provided by the new implementation. The power performances of the two circuits are presented in Fig. 12 (only simulation results). The total power consumption is calculated through the addition of three terms: the power consumption of the clock buffer (for the new prescaler, in this term the power consumption of the D-FF that divides by 2 the clock signal is also included); the power consumption of the synchronous divide-by-4/5 counter; and the power of the asynchronous counter. In Fig. 12(a), the total power consumption for each prescaler at maximum speed and at different values of power supply is depicted. The graphic shows that the new prescaler cannot only reach higher speed but also consumes considerably less power if the two prescalers operate with the same input signal frequency and with the minimum needed power-supply voltage; for instance, the prescaler in [16] can reach 1.4 GHz with power supply of 4.8 V and consumes 34 mw; the new prescaler, however, may reach the same frequency with power supply of 3.2 V and consumes less than 12 mw. In Fig. 12(b), the contribution of the power terms and the total power are drawn; in this case, the input frequency for both circuits are equal to the maximum speed reached by the prescaler described in [16]. The graphic shows that, despite the higher complexity of the new prescaler, the two circuits consume nearly the same power when working with the same power supply and the same input frequency. The performance of different dual-modulus prescalers presented in the literature and our test outcomes are summarized in Table II. Although the comparison among the implementations is feasible, some caution should be taken during the analyses, mainly with the power-consumption data analyses. We notice that for some papers used in this work, [13], [15], [16], [17], Fig. 11. Results for the prescalers maximum input frequency versus the power supply. Both the simulation and the measurement results were obtained using an input pulse with maximum excursion of 3 V. the authors do not elucidate which power consumption terms were considered in the power results (we found, through private communication with the authors, that in [13], [15], [16] the presented power results do not comprise the clock buffer consumption). Table II shows that the new implementation has the best power consumption characteristics and it is one of the fastest prescalers. VI. CONCLUSION The enlargement of the E-TSPC technique with new structures that may double the input and output data rates was reported. Examples of circuits, an 8:1 demultiplexer, a 1:8 multiplexer, and a prescaler, were given to illustrate the applications of these structures. In particular, the detailed design of a dual-modulus prescaler (divide by 128/129), developed in a 0.8 m CMOS process, was studied. The complete layout was drawn and its netlist for SPICE simulations, extracted. The simulated circuit attained 2.19 GHz and 20.9 W MHz power consumption with 5 V (the power consumption of the clock buffer

NAVARRO AND VAN NOIJE: EXTENDED TSPC STRUCTURES WITH DOUBLE INPUT OUTPUT DATA 307 Fig. 12. The power performance of the two prescalers (simulation results). (a) The total power consumption at maximum speed versus the prescaler input frequency for different values of power supply. (b) The power consumption terms versus the power supply. TABLE II SOME PRESCALER RESULTS ARE SUMMARIZED. NOTE THAT, FOR DIFFERENT WORKS, THE PARTIAL, WITHOUT CLOCK BUFFER CONSUMPTION, THE TOTAL, OR, IN A FEW CASES, BOTH POWER CONSUMPTION RESULTS ARE SUPPLIED. THE CROSS-HATCHED VALUES OF THE TABLE FOUND BY SIMULATIONS* *It is not clear whether the power consumption value comprises the clock buffer consumption. is included). The results, compared with other implementations, reassure the advantages of the proposed structures. ACKNOWLEDGMENT The authors would like to thank J. Park and H. Yan for the valuable information concerning the prescaler measurement results of [13] and [15]. REFERENCES [1] J. Navarro and W. Van Noije, E-TSPC: Extended True Single Phase Clock CMOS circuit technique, in VLSI: Integrated Syst. Silicon, IFIP Int, Conf. VLSI, R. Reis and L. Claesen, Eds., London, U.K., 1997, pp. 165 176. [2] J. Navarro, Design techniques for high speed CMOS ASIC s, Ph.D. dissertation, Univ. São Paulo, Dept. Elect. Eng., São Paulo, Brazil, 1998. [3] N. H. E. Weste and K. Eshraghian, Principles of CMOS VLSI design, 2 ed. Reading, Ma: Addison-Wesley, 1993. [4] R. H. Krambeck, C. M. Lee, and H.-F. S. Law, High-speed compact circuits with CMOS, IEEE J. Solid-State Circuits, vol. 17, pp. 614 619, June 1982. [5] N. F. Gonçalves, NORA: a racefree CMOS technique for register transfer systems, Ph.D. dissertation, Katholieke Universiteit Leuven, Leuven, Belgium, 1984. [6] Y. Ji-ren, I. Karlsson, and C. Svensson, A true single-phase-clock dynamic CMOS circuit technique, IEEE J. Solid-State Circuits, vol. 22, pp. 899 901, Oct. 1987. [7] P. Larsson, Skew safety and logic flexibility in a true single phase clocked system, in Proc. IEEE ISCAS, Seattle, USA, WA, May 1995, pp. 941 944. [8] Q. Huang, Speed optimization of edge-triggered nine-transistor D-flip-flop for gigahertz single-phase clocks, in Proc. IEEE ISCAS, Chicago, IL, May 1993, pp. 2118 2121. [9] A. P. Chandrakasan and R. W. Brodersen, Low power digital CMOS design, 2 ed. Norwell, MA: Kluwer, 1996. [10] F. L. Romão, J. Navarro, R. Silveira, and W. Van Noije, 1.2 GB/S SONET/SDH demux in CMOS technology, in Proc. SBMO/IEEE MTT-S Int. Microwave and Optoelectronics Conf., vol. 1, Rio de Janeiro, BR, July 1995, pp. 52 57. [11] J. Navarro and W. Van Noije, Design of an 8:1 MUX at 1.7 Gbit/s in 0.8 m CMOS technology, in Proc. IEEE Great Lakes Symp. VSLSI, Lafayette, IL, Feb. 1998, pp. 103 107. [12] M. Afghahi and J. Yuan, Doubled edge-triggered D-flip-flops for high-speed CMOS circuits, IEEE J. Solid-State Circuits, vol. 26, pp. 1168 1170, Aug. 1998. [13] B. Chang, J. Park, and W. Kin, A 1.2 GHz CMOS dual-modulus prescaler using new dynamic D-type flip-flops, IEEE J. Solid-State Circuits, vol. 31, pp. 749 752, May 1996. [14] C.-Y. Yang, G.-K. Dehng, J.-M. Hsu, and S.-I. Liu, New dynamic flip-flop for high-speed dual-modulus prescaler, IEEE J. Solid-State Circuits, vol. 33, pp. 1568 1571, Oct. 1998.

308 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 10, NO. 3, JUNE 2002 [15] H. Yan, M. Biyani, and K. K. O, A high-speed CMOS dual-phase dynamic-pseudo NMOS ((DP ) ) latch and its application in a dual-modulus prescaler, IEEE J. Solid-State Circuits, vol. 34, pp. 1400 1404, Oct. 1999. [16] J. Navarro and W. Van Noije, A 1.6-GHz dual modulus prescaler using the Extended True-Single-Phase-Clock CMOS circuit technique (E-TSPC), IEEE J. Solid-State Circuits, vol. 34, pp. 97 102, Jan. 1999. [17] J. Craninckx and M. S. J. Steyaert, A 1.75-GHz/3-V dual-modulus divide-by-128/129 prescaler in 0.7 m CMOS, IEEE J. Solid-State Circuits, vol. 31, pp. 890 897, July 1996. Wilhelmus A. M. Van Noije was born in the Netherlands. He received the B.S.E.E. and the M.S.E.E. degrees from the University of São Paulo, Brazil, and the Ph.D. degree in applied science from the Katheoleke Universiteit Leuven, Belgium, in 1975, 1978, and 1985, respectively. Since, 1987, he has been with the Department of Electrical Systems Engineering, University of São Paulo (PSI/EPUSP) where in 1998, he became a Full Professor and since 1999, he has been the Department Head. Also, since 1988, he has been the Coordinator of the VLSI Systems Design Division of the Integrated Systems Laboratory (LSI/PSI/EPUSP), and is involved in IC layout synthesis on sea-of-gates (SOG) structures, analog circuits on SOG, high-speed CMOS integrated circuit techniques, and recently in RF circuits design. João Navarro S., Jr., received the B.S., M.S., and Ph.D. degrees in electrical engineering from the Polytechnic School, University of São Paulo (EPUSP), Brazil, in 1986, 1990, and 1998, respectively. Since 1990, he has been a Research Staff Member at the EPUSP and since 2001, he also been a Professor of Computer Science at SENAC, Brazil. His current research interests include high-speed digital circuits, RF designs, and clock distribution.