A Novel Bus Encoding Technique for Low Power VLSI

Similar documents
Interframe Bus Encoding Technique for Low Power Video Compression

A Genetic Approach To Bus Encoding

Transactions Briefs. Interframe Bus Encoding Technique and Architecture for MPEG-4 AVC/H.264 Video Compression

Interframe Bus Encoding Technique and Architecture for MPEG-4 AVC/H.264 Video Compression

Retiming Sequential Circuits for Low Power

Power Reduction Techniques for a Spread Spectrum Based Correlator

AN EFFICIENT LOW POWER DESIGN FOR ASYNCHRONOUS DATA SAMPLING IN DOUBLE EDGE TRIGGERED FLIP-FLOPS

REDUCING DYNAMIC POWER BY PULSED LATCH AND MULTIPLE PULSE GENERATOR IN CLOCKTREE

Figure.1 Clock signal II. SYSTEM ANALYSIS

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS)

An FPGA Implementation of Shift Register Using Pulsed Latches

Abstract 1. INTRODUCTION. Cheekati Sirisha, IJECS Volume 05 Issue 10 Oct., 2016 Page No Page 18532

DESIGN OF LOW POWER TEST PATTERN GENERATOR

Optimizing area of local routing network by reconfiguring look up tables (LUTs)

EFFICIENT DESIGN OF SHIFT REGISTER FOR AREA AND POWER REDUCTION USING PULSED LATCH

Bus Encoded LUT Multiplier for Portable Biomedical Therapeutic Devices

Peak Dynamic Power Estimation of FPGA-mapped Digital Designs

A Symmetric Differential Clock Generator for Bit-Serial Hardware

ALONG with the progressive device scaling, semiconductor

Implementation of Low Power and Area Efficient Carry Select Adder

Power Problems in VLSI Circuit Testing

Reducing Energy Consumption of Video Memory by Bit-Width Compression

Novel Correction and Detection for Memory Applications 1 B.Pujita, 2 SK.Sahir

Weighted Random and Transition Density Patterns For Scan-BIST

SIC Vector Generation Using Test per Clock and Test per Scan

Fault Detection And Correction Using MLD For Memory Applications

ISSN:

DESIGN OF A NEW MODIFIED CLOCK GATED SENSE-AMPLIFIER FLIP-FLOP

OMS Based LUT Optimization

A Power Efficient Flip Flop by using 90nm Technology

Power Optimization by Using Multi-Bit Flip-Flops

An Efficient Reduction of Area in Multistandard Transform Core

Contents Circuits... 1

A low-power portable H.264/AVC decoder using elastic pipeline

LUT Design Using OMS Technique for Memory Based Realization of FIR Filter

Random Access Scan. Veeraraghavan Ramamurthy Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL

Bit-Serial Test Pattern Generation by an Accumulator behaving as a Non-Linear Feedback Shift Register

DESIGN AND ANALYSIS OF COMBINATIONAL CODING CIRCUITS USING ADIABATIC LOGIC

HIGH PERFORMANCE AND LOW POWER ASYNCHRONOUS DATA SAMPLING WITH POWER GATED DOUBLE EDGE TRIGGERED FLIP-FLOP

LFSR Counter Implementation in CMOS VLSI

Performance Evolution of 16 Bit Processor in FPGA using State Encoding Techniques

Design of Fault Coverage Test Pattern Generator Using LFSR

An Efficient High Speed Wallace Tree Multiplier

Available online at ScienceDirect. Procedia Computer Science 46 (2015 ) Aida S Tharakan a *, Binu K Mathew b

Design and Implementation of Partial Reconfigurable Fir Filter Using Distributed Arithmetic Architecture

High Speed 8-bit Counters using State Excitation Logic and their Application in Frequency Divider

Design and Implementation of High Speed 256-Bit Modified Square Root Carry Select Adder

Area-efficient high-throughput parallel scramblers using generalized algorithms

Testability: Lecture 23 Design for Testability (DFT) Slide 1 of 43

ELEN Electronique numérique

Techniques for Yield Enhancement of VLSI Adders 1

A Low Power Delay Buffer Using Gated Driver Tree

Implementation of BIST Test Generation Scheme based on Single and Programmable Twisted Ring Counters

An MFA Binary Counter for Low Power Application

ADVANCES in semiconductor technology are contributing

LOW POWER AND HIGH PERFORMANCE SHIFT REGISTERS USING PULSED LATCH TECHNIQUE

Logic Devices for Interfacing, The 8085 MPU Lecture 4

Efficient Architecture for Flexible Prescaler Using Multimodulo Prescaler

University of California at Berkeley College of Engineering Department of Electrical Engineering and Computer Science

ISSN:

DIGITAL TECHNICS. Dr. Bálint Pődör. Óbuda University, Microelectronics and Technology Institute

University of California at Berkeley College of Engineering Department of Electrical Engineering and Computer Science. EECS 150 Spring 2000

Reduction of Clock Power in Sequential Circuits Using Multi-Bit Flip-Flops

Design and Analysis of Modified Fast Compressors for MAC Unit

A Review of logic design

An Efficient Power Saving Latch Based Flip- Flop Design for Low Power Applications

Bit Swapping LFSR and its Application to Fault Detection and Diagnosis Using FPGA

Fully Static and Compressed Topology Using Power Saving in Digital circuits for Reduced Transistor Flip flop

Design of Polar List Decoder using 2-Bit SC Decoding Algorithm V Priya 1 M Parimaladevi 2

FPGA IMPEMENTATION OF LOW POWER AND AREA EFFICIENT CARRY SELECT ADDER

Implementation of Memory Based Multiplication Using Micro wind Software

Australian Journal of Basic and Applied Sciences. Design of SRAM using Multibit Flipflop with Clock Gating Technique

Implementation and Analysis of Area Efficient Architectures for CSLA by using CLA

Design of Memory Based Implementation Using LUT Multiplier

Power Efficient Design of Sequential Circuits using OBSC and RTPG Integration

Research Article Ring Counter Based ATPG for Low Transition Test Pattern Generation

PERFORMANCE ANALYSIS OF AN EFFICIENT PULSE-TRIGGERED FLIP FLOPS FOR ULTRA LOW POWER APPLICATIONS

Application Note. Serial Line Coding Converters AN-CM-264

Optimization of memory based multiplication for LUT

VLSI IMPLEMENTATION OF SINGLE CYCLE ACCESS STRUCTURE FOR LOGIC TEST IN FPGA TECHNOLOGY

Low-Power Scan Testing and Test Data Compression for System-on-a-Chip

Long and Fast Up/Down Counters Pushpinder Kaur CHOUHAN 6 th Jan, 2003

Clock Gating Aware Low Power ALU Design and Implementation on FPGA

CPS311 Lecture: Sequential Circuits

VLSI Chip Design Project TSEK06

Area Efficient Pulsed Clock Generator Using Pulsed Latch Shift Register

Research Article Low Power 256-bit Modified Carry Select Adder

Reconfigurable Neural Net Chip with 32K Connections

Low-Power and Area-Efficient Shift Register Using Pulsed Latches

Low Power Estimation on Test Compression Technique for SoC based Design

Low Power Illinois Scan Architecture for Simultaneous Power and Test Data Volume Reduction

We can think of the multiplexor (or mux) as a data selector, the diagram below illustrates a four input mux. X Y

A Combined Compatible Block Coding and Run Length Coding Techniques for Test Data Compression

An Efficient Low Bit-Rate Video-Coding Algorithm Focusing on Moving Regions

Dynamic Power Reduction in Sequential Circuit Using Clock Gating

International Journal of Scientific & Engineering Research, Volume 5, Issue 9, September ISSN

FPGA Implementation of Low Power and Area Efficient Carry Select Adder

Microprocessor Design

data and is used in digital networks and storage devices. CRC s are easy to implement in binary

An Efficient Carry Select Adder

Transcription:

A Novel Bus Encoding Technique for Low Power VLSI Jayapreetha Natesan and Damu Radhakrishnan * Department of Electrical and Computer Engineering State University of New York 75 S. Manheim Blvd., New Paltz, New York 12561 natesa76@newpaltz.edu, * damu@engr.newpaltz.edu Abstract Low power VLSI circuit design is a must for present and future technologies. One of the ways of reducing power in a CMOS circuit is to reduce the number of transitions on the bus and Bus Invert Coding is a widely popular technique for that. In this paper we introduce a new way of coding called the ShiftInv Coding that is superior to the bus invert coding technique. Our simulation results show a considerable reduction on the number of transitions over and above that obtained with bus invert coding. Further, the proposed technique requires only 2 extra bits for the low power coding, regardless of the bit-width of the bus and does not assume anything about the nature of the data. Keywords: Bus coding, low power, bus invert code, shift invert code, bus transitions. 1. Introduction and Related Work With ever increasing complexity of VLSI circuits and an increased focus on mobile computing, low power design techniques have become a must for all aspects of digital design. Over the past few years, a number of coding schemes have been proposed for reducing the transitions on a bus. For data buses, one popular coding scheme is the bus invert coding technique [1]. This is a suitable technique for uncorrelated data patterns. A probability based mapping technique is proposed by Ramprasad et.al [2] for patterns with non-uniform probability densities. For instruction address buses, Gray code [3], T0 code [], the Beach code [5] and incxor [2] code have been proposed. Other variants of the bus invert coding include a partial bus coding technique [7] and the decomposition approach [6]. Both these techniques have an area overhead to determine the suitable partition of the data bus. In addition, the decomposition approach [6] can require up to p-1 extra lines on the bus where p is the number of partitions of the original data bus. The partial bus invert coding [7] has the limitation that inspite of the additional hardware, it might not utilize the bus invert coding for a subset of the bus lines, there by producing sub-optimal results. Further, the partial bus invert coding requires that one has the information apriori of the sequence of the memory address patterns on the bus. In this work, we propose a simple yet efficient improvement to the Bus Invert Coding technique. The proposed technique does not have any additional area overhead in determining the transition correlations and transition probabilities. It does not need any prior knowledge of the address patterns. The data on the bus can be uncorrelated and completely random, just as was the case with the original bus invert coding. The number of extra bus lines needed by this method is always 2, regardless of the bit-width of the bus. This paper is organized as follows. We first propose the terminology used in this paper in Section 2. The basics of Bus Invert Coding appear in Section 3 followed by details of the proposed technique ShiftInv Coding in Section. Section 5 details a hardware implementation for the proposed technique. The simulation results for a number of test cases are presented in Section 6 followed by conclusions and suggestions for future work in Section 7. 2. Terminology The following terminology is used in this paper. Let D = d w-1, d w-2 d 0 represent a binary data string of width w. The data string at any time instant k can be represented as, D k = d w-1 k, d w-2 k, d 0 k.

Let B k be the data transmitted on the bus at time k. Note that the bit width w of the bus (i.e., the encoded data that gets transmitted on the bus) could be greater than w depending on the coding scheme used. For instance, in default Bus Invert Coding [1], w = w + 1. For variations of Bus Invert Coding, w can be greater than w+1. In any coding scheme based on bus inversion, the value of the i th bit, b i on the bus will be either the data value d i or 1-d i. Thus, for all i, 0 <= i <w, b i = d i, uninverted bit OR b i = 1 - d i, inverted bit. 3. Bus Invert Coding The basic idea behind Bus Invert Coding [1] originated by noting that a lot of power is wasted during data transmission in off-chip bus lines. This is due to the switching of the high capacitance lines, Therefore power could be saved by minimizing the number of transitions occurring on these bus lines. Let N be the number of transitions between the data D k+1 at time k+1 and the value on the bus B k, i.e., N represents the total number of bit-positions in which the new data and the existing value on the bus differ. Let D k+1(inv) be the inverted data. i.e., for all i, 0 <= i <w, (INV) d i = 1-d i. Let N INV be the number of transitions between D k+1(inv) and B k. The Bus Invert Coding technique chooses either D k+1 or D k+1(inv) to be transmitted on the bus. If N <= N INV, then D k+1 is transmitted on the bus; otherwise, the inverted data D k+1(inv) is sent on the bus. Recently, a number of techniques have been proposed for bus encoding. Most of these techniques either center on the original bus-invert coding scheme or assume some special data conditions. In this paper, we do not assume any special conditions for the data transitions on the bus. We assume the data to be completely random. We propose a technique that further enhances the default Bus Invert Coding scheme.. Shiftinv Coding The main idea in the proposed technique is to optionally shift the data bits by one bit position (either left-shift or right-shift) if the shifting reduces the number of bus transitions. We define the left- shift operation on a w- bit data as follows. d (LS) i = d i-1, d (LS) 0 = d w -1. 1 <= i < w, i.e., we perform a circular left-shift. This will guarantee that we do not lose any information from the original data. The right-shift operation is defined similar to the leftshift, as follows. d (RS) i = d i+1, d (RS) w -1 = d 0. 0 <= i <w -1, Example.1 Consider the following example. (ignore the extra bits used in the encoding scheme for a moment) Let B k = 11 (assume a -bit bus), and, the new data at time k+1, D k+1 = 01 01; therefore, the inverted data D k+1(inv) = 01 01 For this example, the number of transitions N between B k and D k+1 is 5. In the case of Bus Invert Coding, we would try to see whether it is beneficial (i.e., whether the number of 0 to 1 and 1 to 0 transitions are reduced) to send D k+1(inv) over the bus. The number of transitions N INV between B k and D k+1(inv) is 3. Since N INV < N, in the Bus Invert Coding technique, D k+1(inv) will be sent over the bus at time k+1. Now, let us see what happens when we left-shift the data D k+1 once, as defined above. We denote the leftshifted data at time k+1 as D k+1(ls). We see that, D k+1(ls) = 00 11. Comparing to, B k = 11, the number of transitions N LS between B k and D k+1(ls) is just 1, which is better than the 3 transitions that one gets from the inverted data D k+1(inv). Thus, in this case, it is clear that by sending the left-shifted data, we can reduce the number of transitions even further than the reduction obtained from sending the inverted data.

The rationale behind using the shift operation is simple. By shifting the data at time k+1, it is possible that the bit values could match in more places with the existing data on the bus at time k than if the data bits were either inverted or left unchanged. The matching of bits in more places implies fewer transitions, thereby giving a better solution. The above example illustrates how a left-shift operation is better than inversion. Similarly, one can construct examples to show how a right-shift operation on the data reduces the number of transitions. Thus, we conclude that one can further reduce the number of transitions either by performing a left-shift or a rightshift operation. It is obvious that shifting left or right will not always reduce the number of transitions. Depending on the values of B k and D k+1, it is possible that either the inverted data D k+1(inv) or may be even the unmodified/original data D k+1 gives the least transitions when sent on the bus. For each new data that needs to be sent over the bus, we evaluate the transitions N, N INV, N LS and N RS between B k and D k+1, D k+1(inv), D k+1(ls), and D k+1(rs) respectively. We then choose the encoding that results in the least number of transitions. The steps of the proposed technique can be outlined as below. Procedure ShiftInv() { Input : D k+1, B k Output: B K+1(SHIFT_INV) } N num_transitions(d k+1, B k ) N INV num_transitions(d k+1(inv), B k ) N LS num_transitions(d k+1(ls), B k ) N RS num_transitions(d k+1(rs), B k ) B K+1 one of (D k+1, D k+1(inv), D k+1(ls), D k+1(rs) ) depending on min(n,n INV,N LS, N RS ) The procedure num_transitions(d,b) returns the number of bit-positions in which the passed in vectors D and B differ. Note that the data that gets sent over the bus, B k+1 can be one of D k+1, D k+1(inv), D k+1(ls), D k+1(rs). Thus, we need to tag the bus with 2 additional bits that indicate the coding that was used. This will be used to decode the bus value appropriately at the receiving end. Thus, in ShiftInv coding, the width of the bus w = w + 2, where w is the width of the data vector and we use 2 additional bits as compared to 1 additional bit in default Bus Invert Coding[1]. 5. Hardware Modeling Figure 1 shows one way of hardware realization for the proposed ShiftInv coding. For illustration purposes, we show a block-diagram of ShiftInv coding for an -bit data. The first set of blocks indicates the modes involved in ShiftInv coding. Note that the left-shift and right-shift blocks do not require any additional hardware. They can be realized by a mere readjustment of the data bits D. The -bit inverter is used to get the inverted data as we consider inversion of data bits as one of the modes for coding. Table 1. Bit representation for SHIFTINV coding Default 00 (no encoding) Left-Shift 01 Right-Shift Invert 11 The 2-bits labeled SHIFT_INV k are the 2 additional bits used to indicate the scheme used at time instance k. Table 1 shows the bit representation to indicate the coding scheme used. For example, if the input data is left-shifted before we send it over the bus, the flag SHIFT_INV will be assigned the value 01. Similarly, the other bit assignments can be obtained from the table. The block called XOR_ADD is the hardware version of procedure num_transitions() that is used in procedure ShiftInv(). It first performs an XOR of the 2 input vectors. The total number of 1 s in the XOR output indicates the number of positions in which the two input vectors differ. An adder circuit then counts the total number of 1 s in the XOR ed output. Thus the output of XOR_ADD computes the equivalent of procedure num_transitions().

Since we have a total of s ( for data D and 2 for SHIFT_INV) this block is termed a XOR_ADD. We need four XOR_ADD blocks. It can be seen SHIFT_INV 2 k-1 { B k-1 Default (No Logic) 00 D k that each XOR_ADD block also has 2 additional bits as input. These 2-bits will be one of {00, 01,, 11} Wired Left Shift (No Logic) Wired Right Shift (No Logic) 01 -way comparator SHIFT_INV k 2 B k -bit Inverter 11 Figure 1. A Hardware Model for SHIFTINV Coding for -bit Data depending on the coding scheme and can be obtained from Table 1. These values allow one to evaluate the total number of transitions including the SHIFT_INV values. The maximum number of bit positions in which the inputs to a XOR_ADD can differ is. Thus the XOR_ADD circuit will generate -bits to indicate the total number of transitions on the bus for each type of encoding. In general, for a w-bit data, we will need a (w+2)-bit XOR_ADD block that generates a log 2 w + 1 bit output. The outputs from the XOR_ADD blocks are sent to a -way comparator which finds the mode that has the least number of transitions. The encoded data that has the least number of transitions is then sent over the bus as B k along with the values for SHIFT_INV k. 6. Simulation Results We implemented the proposed ShiftInv coding technique in C++ code. As mentioned earlier, no assumptions on the nature of data on the bus were made. Completely random values for the data bus D k were generated. The bit-width and the time for simulation were passed as inputs to the C++ program. We also simulated the Bus Invert Coding using the same randomly generated data so as to compare the performance of ShiftInv Coding with respect to the Bus Invert Coding. Table 2 summarizes the results for buses whose widths are a power of 2. For each bit-width pattern, 0000 simulation cycles were performed.

Table 2. Simulation Results for buses with width 2 n (n = 3,, 5, 6, 7) Bit-width (default no coding) DEF (BusInv Coding) BIC Transitions per cycle (ShiftInv Coding) SINV.00 3.27 3.17 16.00 6.3 6.60 32 16.00 1.23 13.0 6 32.00 29.15 2.6 12 6.01 60.26 5.9 Table 3. Simulation Results for buses with arbitrary width Bit-width (default no coding) DEF (BusInv Coding) BIC Transitions per cycle (ShiftInv Coding) SINV 9.50 3.77 3.61 13 6.50 5.53 5.31 21.50 9.16.3 29 1.50 12.3 12.2 35 17.50 15.60 15.12 3 21.50 19.39 1.0 It can be seen that in all cases, the ShiftInv coding reduces the average number of bus transitions per simulation cycle considerably from the default (unencoded) scheme. Further, when compared with the Bus Invert Coding, the ShiftInv coding is clearly the winner. The additional savings is achieved by using just one extra line as compared with default Bus Invert Coding. Table 3 shows the simulation results for buses whose widths are not a power of 2. It is interesting to see that the transition reductions are greater for the buses with arbitrary bus widths than buses whose widths are a power of 2. We believe that the additional savings suggest that the shift operations are inherently well suited for encoding buses with arbitrary widths, which is generally the case for data buses, especially for the on-chip data buses. We can see from the tables as the bus-width increases, the ShiftInv coding results in a larger reduction in the average number of. Thus, as applications go towards 6-bits and beyond, the ShiftInv coding scheme will be more useful than the traditional Bus Invert Coding. It should also be noted that the number of extra lines for the ShiftInv coding remains at 2 irrespective of the width of the data bus. We claim that the reduction in the power savings obtained by ShiftInv Coding can more than offset this small increase in the hardware requirements.

7. Conclusions and Future Work We presented a new method for bus encoding using left-shift and right-shift operations. This technique is a simple yet efficient scheme that enhances the Bus Invert Coding technique. On completely random data, the simulation results suggest that the proposed ShiftInv coding reduces the average number of transitions over and above that given by the well-known Bus Invert Coding technique. The proposed coding scheme does not have too much area overhead and it does not assume any correlation between the data values. Further, the proposed technique uses only 2 extra lines regardless of the width of the data bus. The ShiftInv coding scheme is also better suited for buses with arbitrary widths. In future, we plan to investigate the merits of this technique by generating other coding schemes resulting from a combination of the shift and invert operations. Also, we plan to investigate special purpose applications for which one may get even more savings using ShiftInv Coding. As we are in the deep submicron era, the inter-wire parasitic capacitance is a dominant factor for the energy dissipation in circuits []. In future we plan to look at the energy minimization based ShiftInv coding that will reduce the total bus energy and not just the number of bus transitions. From the experimental data in Table 2 and Table 3, we see that the savings are more for smaller bus-widths and less for larger bus-widths. This is an expected behavior since the proposed ShiftInv coding is an extension of the Bus Invert Coding and the Bus Invert Coding method [1] does not perform well for larger bus widths. As part of our future work, we plan to explore the possibility of applying the Shift techniques to other types of coding such as the source-coding framework [2]. References [1] M. R. Stan and W. P. Burleson, Bus-Invert coding for low-power I/O, IEEE Trans. on VLSI, vol. 3, pp. 9-5, March 1995. [2] S. Ramprasad, N. R. Shanbag, and I. N. Hajj, A coding framework for low power address and data busses, IEEE Trans. on VLSI Systems, vol. 7, pp. 212-221, June 1999. [3] C. L. Su, C. Y. Tsui, and A. M. Despain, Saving power in the control path of embedded processors, IEEE Design and Test of Computers, vol.. 11, no., pp. 2-30, 199. [] L. Benini, G. De Micheli, E. Macii, D. Sciuto, and C. Silvano, Asymptotic zero-transition activity encoding for address buses in low-power microprocessor-based systems, Great Lakes VLSI Symposium, pp. 77-2, Urbana IL, March 1997. [5] L. Benini, G. De Micheli, E. Macii, M. Poncino, and S. Quer, System-level power optimization of special purpose applications: The beach solution, Proc. Int. Symp. Low Power Electronics Design, pp. 2-29, August 1997. [6] S. Hong, U. Narayanan, K.S. Chung and T. Kim, Bus-Invert Coding for Low-Power I/O A Decomposition Approach, Proc. 3 rd IEEE Midwest Symp. Circuits and Systems, August 2000. [7] Y. Shin, S.I. Chae and K. Choi, Partial Bus-Invert Coding for Power Optimization of Application- Specific Systems, IEEE Trans. on VLSI Systems, vol. 9, pp. 377-33, April 2001. [] P. P. Sotiriadis and A. Chandrakasan, Bus energy minimization by transition pattern coding (TPC) in deep sub-micron technologies, Proc. 2000 IEEE/ACM Int. Conf. Computer-Aided Design, pp. 322-32, November 2000.