Efficient Realization for A Class of Clock-Controlled Sequence Generators

Similar documents
LFSR Counter Implementation in CMOS VLSI

DESIGN and IMPLETATION of KEYSTREAM GENERATOR with IMPROVED SECURITY

An Improved Hardware Implementation of the Grain-128a Stream Cipher

Understanding Cryptography A Textbook for Students and Practitioners by Christof Paar and Jan Pelzl. Chapter 2 Stream Ciphers ver.

Understanding Cryptography A Textbook for Students and Practitioners by Christof Paar and Jan Pelzl. Chapter 2 Stream Ciphers ver.

IN DIGITAL transmission systems, there are always scramblers

A Pseudorandom Binary Generator Based on Chaotic Linear Feedback Shift Register

Design of Fault Coverage Test Pattern Generator Using LFSR

Pseudorandom bit Generators for Secure Broadcasting Systems

Chapter 4. Logic Design

How to Predict the Output of a Hardware Random Number Generator

Performance Evaluation of Stream Ciphers on Large Databases

Decim v2. To cite this version: HAL Id: hal

AN EFFICIENT LOW POWER DESIGN FOR ASYNCHRONOUS DATA SAMPLING IN DOUBLE EDGE TRIGGERED FLIP-FLOPS

True Random Number Generation with Logic Gates Only

Randomness analysis of A5/1 Stream Cipher for secure mobile communication

Guidance For Scrambling Data Signals For EMC Compliance

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Testing Digital Systems II

Sequential Logic Notes

Power Reduction Techniques for a Spread Spectrum Based Correlator

LFSRs as Functional Blocks in Wireless Applications Author: Stephen Lim and Andy Miller

data and is used in digital networks and storage devices. CRC s are easy to implement in binary

SRAM Based Random Number Generator For Non-Repeating Pattern Generation

Y. Tsiatouhas. VLSI Systems and Computer Architecture Lab. Built-In Self Test 2

Bit Swapping LFSR and its Application to Fault Detection and Diagnosis Using FPGA

EFFICIENT IMPLEMENTATION OF RECENT STREAM CIPHERS ON RECONFIGURABLE HARDWARE DEVICES

A Low Power Delay Buffer Using Gated Driver Tree

Chapter Contents. Appendix A: Digital Logic. Some Definitions

Modified Alternating Step Generators with Non-Linear Scrambler

DESIGN OF LOW POWER TEST PATTERN GENERATOR

Segmented Leap-Ahead LFSR Architecture for Uniform Random Number Generator

Principles of Computer Architecture. Appendix A: Digital Logic

Efficient Architecture for Flexible Prescaler Using Multimodulo Prescaler

THE USE OF forward error correction (FEC) in optical networks

Synthesis Techniques for Pseudo-Random Built-In Self-Test Based on the LFSR

Design of Test Circuits for Maximum Fault Coverage by Using Different Techniques

Dynamic Power Reduction in Sequential Circuits Using Look Ahead Clock Gating Technique R. Manjith, C. Muthukumari

Area Efficient Pulsed Clock Generator Using Pulsed Latch Shift Register

Final Exam CPSC/ECEN 680 May 2, Name: UIN:

LFSR stream cipher RC4. Stream cipher. Stream Cipher

Available online at ScienceDirect. Procedia Computer Science 46 (2015 ) Aida S Tharakan a *, Binu K Mathew b

ISSN:

Design and Implementation of Data Scrambler & Descrambler System Using VHDL

MATHEMATICAL APPROACH FOR RECOVERING ENCRYPTION KEY OF STREAM CIPHER SYSTEM

Computer Architecture and Organization

Design of Shift Register Using Pulse Triggered Flip Flop

Comparative Analysis of Stein s. and Euclid s Algorithm with BIST for GCD Computations. 1. Introduction

SIC Vector Generation Using Test per Clock and Test per Scan

CS150 Fall 2012 Solutions to Homework 4

Optimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015

Design And Implimentation Of Modified Sqrt Carry Select Adder On FPGA

VLSI Test Technology and Reliability (ET4076)

Chapter 5 Sequential Systems. Introduction

Random Access Scan. Veeraraghavan Ramamurthy Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL

Design for Test. Design for test (DFT) refers to those design techniques that make test generation and test application cost-effective.

1. a) For the circuit shown in figure 1.1, draw a truth table showing the output Q for all combinations of inputs A, B and C. [4] Figure 1.

Implementation of BIST Test Generation Scheme based on Single and Programmable Twisted Ring Counters

Implementation of Memory Based Multiplication Using Micro wind Software

VLSI System Testing. BIST Motivation

VLSI Technology used in Auto-Scan Delay Testing Design For Bench Mark Circuits

Asynchronous (Ripple) Counters

CSE 352 Laboratory Assignment 3

Stream Cipher. Block cipher as stream cipher LFSR stream cipher RC4 General remarks. Stream cipher

Long and Fast Up/Down Counters Pushpinder Kaur CHOUHAN 6 th Jan, 2003

A clock is a free-running signal with a cycle time. A clock may be either high or low, and alternates between the two states.

High Performance Carry Chains for FPGAs

VHDL Implementation of Logic BIST (Built In Self Test) Architecture for Multiplier Circuit for High Test Coverage in VLSI Chips

Cryptanalysis of LILI-128

Design of BIST with Low Power Test Pattern Generator

Implementation of High Speed Adder using DLATCH

BLOCK CIPHER AND NON-LINEAR SHIFT REGISTER BASED RANDOM NUMBER GENERATOR QUALITY ANALYSIS

(12) Patent Application Publication (10) Pub. No.: US 2003/ A1

NH 67, Karur Trichy Highways, Puliyur C.F, Karur District UNIT-III SEQUENTIAL CIRCUITS

New Address Shift Linear Feedback Shift Register Generator

Power Problems in VLSI Circuit Testing

ISSN:

Reconfigurable FPGA Implementation of FIR Filter using Modified DA Method

A High- Speed LFSR Design by the Application of Sample Period Reduction Technique for BCH Encoder

128 BIT CARRY SELECT ADDER USING BINARY TO EXCESS-ONE CONVERTER FOR DELAY REDUCTION AND AREA EFFICIENCY

Abstract 1. INTRODUCTION. Cheekati Sirisha, IJECS Volume 05 Issue 10 Oct., 2016 Page No Page 18532

Testing of Cryptographic Hardware

ECE 172 Digital Systems. Chapter 2.2 Review: Ring Counter, Johnson Counter. Herbert G. Mayer, PSU Status 7/14/2018

WG Stream Cipher based Encryption Algorithm

Overview: Logic BIST

A low jitter clock and data recovery with a single edge sensing Bang-Bang PD

Design and Implementation of High Speed 256-Bit Modified Square Root Carry Select Adder

SYNCHRONOUS DERIVED CLOCK AND SYNTHESIS OF LOW POWER SEQUENTIAL CIRCUITS *

Optimization of FPGA Architecture for Uniform Random Number Generator Using LUT-SR Family

Fault Detection And Correction Using MLD For Memory Applications

Notes on Digital Circuits

An FPGA Implementation of Shift Register Using Pulsed Latches

Digital Phase Adjustment Scheme 0 6/3/98, Chaney. A Digital Phase Adjustment Circuit for ATM and ATM- like Data Formats. by Thomas J.

An MFA Binary Counter for Low Power Application

Available online at ScienceDirect. Procedia Technology 24 (2016 )

Clocks. Sequential Logic. A clock is a free-running signal with a cycle time.

Logic Design Viva Question Bank Compiled By Channveer Patil

Decade Counters Mod-5 counter: Decade Counter:

CHAPTER 4: Logic Circuits

A New Proposed Design of a Stream Cipher Algorithm: Modified Grain - 128

Transcription:

Efficient Realization for A lass of lock-ontrolled Sequence Generators Huapeng Wu and M. A. Hasan epartment of Electrical and omputer Engineering, University of Waterloo Waterloo, Ontario, anada Abstract In this article, hardware implementation of the - sequence generator is discussed. A novel architecture for the - generator using an extended linear feedback shift register (XLFSR) is presented. ompared to the conventional LFSR based schemes, the proposed scheme is advantageous in the sense that it yields generators of high and constant throughput. When this scheme is used to implement generators in VLSI technologies, low area and power consumption are also expected. Moreover, it has been shown that the proposed 1-2 generators are very suitable for building long Gollmann s cascaded generators. Key Words: Sequence generator, LFSR, -sequence, nonuniform decimation, Gollmann s cascaded generator.

I. Introduction The stream cipher [9] is used in many cryptographic applications because it can operate at a very high data rate. The key component in a stream cipher system is the pseudorandom sequence generator. How to easily generate sequences which are good in the sense of cryptography has long been an interesting research area [9]. Linear Feedback Shift Register (LFSR) based sequence generators are attractive because of their conceptual simplicity and low implementation complexity. This type of generators include clock-controlled generators of which stop-and-go generator [2] and 1-2 generator (or step-1/step-2 generator) [1] are most common. The stop-and-go generator uses two LFSR s where the output of the first one is used to control the clock of the second LFSR. Therefore, an output bit 1 of the a first LFSR causes the second one to shift its state, while implies that the state of the second LFSR remains unchanged. The output of this second LFSR is then the output of the stop-andgo generator. It has been shown that the repeated bits in the output sequence of a stop-and-go - generator can lead an attacker to have better chances to succeed [6]. The generator tries to solve this problem by shifting the second LFSR once when the output bit of the first LFSR is 0, and shifting twice when the output bit of the first LFSR is 1. For a 1-2 generator, suppose that the original sequence and the control sequence are and,,, respectively, then the generated sequence is! # $&% " '(*),+.- (0/ #32 $&% 1 '(*),+ (0/ - 4 52 where 6 is the decimation sequence [1]. If is an -sequence of period 7 849 and has period: with;<>=@?6 $&% BA'),+ E, then the 1-2 generator yields the 1

sequence! of maximal period6 :. Moreover, if every prime factor of: divides6 then the linear complexity of! is no less than F: [1, 5]. Such generators can also be cascaded to obtain sequences with increasingly long period and high linear complexity [1]: The clock input of thegth register,gihj?egk9, is the sum (modulo 2) of the clock input of the th?eg.9 register and the output of the th register. Likewise the output of the cascaded generator is the sum of the clock input of the last register and the output of the last register. Assuming that there arel stages and the control sequence to the first stage is an -sequence of period6, then such a cascaded generator has a period of?m 8 9 ONQP % and linear complexity of R?M 8 9 onventional LFSR based implementations of 1-2 generators are to use an extra output buffer, or require two clocks or two LFSRs. As we shall show later, these schemes are not quite suitable for implementing long Gollmann s cascaded generators in applications where area is of prime concern. In this work, we propose a novel implementation of 1-2 generator. It has a low space complexity and a high and constant throughput. When it is implemented with VLSI technologies, it can potentially reduce the power consumption. It is also shown that the proposed scheme is especially suitable for building long Gollmann s cascaded generators. ON. The organization of this article is as follows. In Section II, a brief account of previous schemes for implementing 1-2 generators is given. Then we propose a new structure of the 1-2 generator in Section III. oncluding remarks are given in Section IV. II. onventional LFSR based Schemes From the definition of the 1-2 generator, if both the clock input and the control sequence have the same rate, the generator will have an irregular output bit rate which depends on the appearance of 1 s in the control sequence. That is, the generator outputs one bit per clock cycle when the control bit is 0 and one bit every two clock cycles when the control bit is 1. onsequently, it has two shortcomings: One is the irregularity of the output rate and the other is the reduced 2

throughput relative to the system clock input to the LFSR. To overcome these drawbacks, the following schemes can be used. A. Two clock scheme One method to overcome the problem of irregular throughput is to use two clocks where one clock has a rate half of the other s. When the control bit is 0 the slower clock is used as the input clock to the LFSR, and when the control bit is 1 the faster clock is used. A one-bit buffer is required for temporarily storing the output bit of the LFSR, and then the output of the generator can be clocked out from the buffer at the rate equal to the rate of the slower clock. Suppose that the slower clock signal is obtained from the system clock source using frequency-division method, the generator implemented in this way will suffer a low throughput which is only half of the system clock rate on the average. In this case, to match the generator throughput, the rate of the control sequence should also be sustained as half of the system clock rate. B. Output buffer scheme Another method to overcome the irregularity of the output rate and maintain a comparatively high throughput requires a multi-bit buffer at the output end of the LFSR. First the LFSR works with the controlled clock input to yield the required sequence which enters the buffer with an irregular rate. Then output bits of the generator are clocked out from the buffer at a slower rate after an initial delay. Here the output buffer functions as a filter for the output of the LFSR to yield the required sequence with a slower output rate. Obviously both the buffer size and generator throughput depend on the number of 1 s in one period of the control sequence, as well as the distribution and - length of 1-runs of the control sequence. If the generator is controlled by a periodic sequence with equal numbers of S s and s in one period, then it has an output rate which is of the system 3

clock rate. 1 In this case if the buffer s output clock rate is faster than S of the system clock rate, the buffer will eventually run out of the data; on the other hand, if the output clock is slower than this rate, the backlog in the buffer will be getting larger and larger and will eventually overflow. Our simulation results (see Figure 1) indicate that the buffer size increases rapidly when TH if the control sequence is an -sequence generated with an LFSR of the same length. 25 20 15 10 5 4 6 8 10 12 14 Figure 1: LFSR length vs the minimal buffer lengthu (output buffer scheme).. Two LFSR scheme - The generator implemented with the above two schemes yields a constant but lower throughput. One way to avoid this problem is to use two LFSRs [3]. We know that 2-decimation of an - sequence is still the same -sequence but with a different initial phase [10]. Then a 1-2 generator can be built with two identical LFSRs,V andw, both working at the input clock rate. LFSRW has a different initial state from LFSRV in that LFSRW yields the 2-decimation of the -sequence generated by LFSRV. The output sequence of the generator consists of the bits from both LFSRs. When the control bit is 0 the output bit of LFSRV is chosen as the output bit of the generator, otherwise the output bit of LFSRW is the output bit of the generator. Obviously, the generator has an output rate equal to the input clock rate. One disadvantage of this scheme is its relatively higher complexity which is more apparent when a cascaded generator is to be used. b -sequence is used as the control sequence, then the buffer must work at a clock rate preciously the input clock rate. arexzy\[^] andxzy@[]`_a 1 Note that there 1 s 0 s in one period of anb withcedfx>yg_ha -sequence XYkjlXZY@[]m_ia X Y. If such an of 4

P P % % P n p III. New 1-2 Generator Architecture A. XLFSR Given an LFSR with primitive characteristic polynomialno?0p initial state, the output is an -sequence and is given by [10] wheres is a root ofno?ep " Tr?rqs andq GF?M 8 From the above identity we have. 8 '),+ $&% n Tr?rqts $ 8 %u Tr?vqts 8 '),+ $&% P P n Tr?vqts $ 8 n 8 $&% Tr?vqts 2 8 ') $&% P % n $&% Tr?vqts $ 8 8 '),+xwn $&% } + 8 $&%yn 2? 9{z $&%Q~ Tr?rqs $ 8 Jp 8 2 8 '),+ $&% n P P and its nonzero } (3.1) wherez ifg is the Kronecker function which is 1 and 0 otherwise. It is little tricky to see that if a device can be built using (3.1) to produce %, then it will in effect generate a sequence decimating -sequence by 2. Such a device can be realized by an LFSR- style structure which is shown in Figure 2. We combine this structure and the original LFSR together and it yields a new LFSR as shown in Figure 3. This extended LFSR which is referred to as XLFSR can achieve two operations: ( ) generation of an -sequence and (! ) decimation of the -sequence by 2, depending on the positions of switchesƒ. A little more complicated version of the XLFSR has been proposed in [7] for finite field exponentiation where the register can be shifted in both directions. When the switches are at upper positions (dotted lines), the upper portion of the circuit is disconnected and the circuit is just a conventional LFSR generating the output bit. When the 5

P n^% n n n 8 $&% n% n n n 8 $&% ^ Figure 2: An LFSR to generate 2-decimation of an -sequence. n^% n n n 8 $&% K K K K n^% n n n 8 $&% Output Figure 3: XLFSR: to generate both 1-decimation and 2-decimation of an -sequence. switchesƒ are at lower positions (solid lines), the circuits are configured to perform decimationby-2 operations and yield the current output. The switchesƒ are controlled by the current bit % bit whenh7 in the control sequence: They are at upper position, and at the lower position whenˆ. The switch control circuitry is very simple and omitted from the figure. Obviously, if we use the bits of the control sequence to control the switches, the XLFSR will work exactly in the same way as a - generator does. 6

P B. omplexities of XLFSR Let f?rn denote the Hamming weight of the characteristic polynomialno?0p. Then the size complexity of the conventional LFSR with characteristic polynomialno?0p is f?mn 9Š XOR gates and -bit registers, while the corresponding XLFSR can be built with w f?mn 9 ~ XOR gates and 1-bit registers. The switches are very simple (three-state drivers) and we do not take them f?rn into consideration. Obviously, when is not very large the XLFSR does not significantly whennœ?ep increase complexity compared to the conventional LFSR. For instance, is a primitive pentanomial, the construction of the corresponding XLFSR requiress more XOR gates compared to that of the conventional LFSR. is w f?mn When the switches?vƒ are at the upper positions (dash lines), the time delay for generating 9 ~Ž: 2 :, where:m is the time delay of one XOR gate and:` denotes the time delay ( of a 1-bit register flip-flop). When the switches are at the lower positions (solid lines), the upper XOR gate network is connected and the total time delay for generating % is w {?Mn 9 ~*: 2 :. 3 only : 2 : ifno?0p Ifno?0p learly, the time complexity is is a primitive trinomial. is not a or f?rn trinomial, both the upper and lower XOR gate feedback networks can be implemented in full parallel form and the total time complexity becomes? 0š} ;?r f?rn the size complexity remains the same. 9œ O 2ž : 2 : while. High speed XLFSR For each Fibonacci type LFSR, there is a corresponding high speed Galois type LFSR that can produce the same output sequence [8]. A Galois type XLFSR can be derived from a Galois type LFSR in a similar way the XLFSR has been derived from the Fibonacci type LFSR. This Galois type XLFSR is shown in Figure 4. When used as a sequence generator, it can produce the same output sequence as a (Fibonacci type) XLFSR does. 7

n^% n n 8 $&% K K K K n^% n n 8 $&% Output Figure 4: High speed XLFSR. S Two clock Output buffer Two LFSR Proposed scheme scheme scheme scheme # of LFSRs a 1 1 2 1 (XLFSR) Throughput rate 1 1 Extra buffer bit yes none none # clock sources 1 2 1 1 omplexity of overall control small moderate small very small Initial delay very small yes none none Precomputation none yes small none a The LFSR generating the control sequence is not included here. Table 1: omparisons of the new scheme to three other schemes. An advantage of the Galois type XLFSR over Fibonacci type XLFSR is that the former does not cascade the XOR gates yielding a higher speed of operation, especially when no?ep is not a trinomial. The size complexity of the Galois type XLFSR is the same as the Fibonacci type XLFSR for anyno?ep.. omparisons omparisons between the proposed scheme and those discussed in Section II are shown in Table 1. In the table, the throughput is denoted by the average number of output bits per input clock cycle. 8

For example, means the throughput is 1 bit every two input clock cycles. In the two-clock scheme, one clock can be simply derived from the other by halving the frequency of the latter. While in the output-buffer scheme, the generation of the slower clock can be a little more complex. The precomputation required in the output buffer scheme includes the effort to decide the buffer size and the initial delay. Also note that the LFSR used in the proposed scheme is actually XLFSR which is a little more complex than the conventional LFSR. From Table 1 it is clear that the new scheme has advantages over the others in terms of space complexity, throughput, and simplicity of the overall control. E. ascaded XLFSR s The use of XLFSR to build Gollmann s cascaded generator is straightforward. OnlyL 2Ÿ XLFSR s are needed for anl -stage cascaded generator and the overall control is very simple. The system is clocked by a single clock source and the throughput of the generator is equal to the clock rate. A cascaded generator of two stages implemented using three XLFSRs is shown in Fig 5. The first XLFSR (far left) is used as an -sequence generator producing the control sequence for the first stage of the cascade. A delay block is used at each stage for aligning the input to the XLFSR of this stage with its immediate output and then both the input and output bits are added (mod 2) together to give the final output bit for this stage [1]. In this way we can build a cascade ofl stages where the binary input to theg th stage is used to control the XLFSR of this stage and is also added to the immediate output of the XLFSR to give the final output from this stage to be passed on as the NQP the?vg 2 N input to th stage. The period and linear complexity ofl -stage cascaded generators are?m 8{9 % and R?r 8{9, respectively [1]. A simple comparison ofl -stage cascaded generators built with different schemes is shown in Table 2. Note that in both two-clock and output-buffer schemes extra buffer is required between any two stages of LFSR and consequently extra delay occurs at every stage. 9

XLFSR XLFSR XLFSR Figure 5: Gollmann s cascaded generator built with XLFSR. Two clock Output buffer Two LFSR Proposed scheme scheme L 2ž scheme L 2ž scheme L 21 L 21 # of LFSRs a (XLFSR) S Throughput rate L 1 1 Extra buffer bits yes none none # clock sources L Initial delay clock cycles yes none none Precomputation none yes small none a The LFSR generating the control sequence is included. Table 2: omparisons of schemes to build anl -stage cascaded generator. IV. oncluding Remarks - In this article, we have presented a novel LFSR style structure for the generator. ompared to other conventional LFSR based schemes, the proposed scheme has the merits of high and constant throughput, no initial delay, no precomputation and simple overall control. It has been shown that the proposed scheme is very suitable for building long Gollmann s cascaded generators. The idea of XLFSR can be easily generalized to a class of LFSR style structures that can achieve both step- and step- - forward decimation. Practical implementation of such generator using this idea, however, may require complicated circuitry when S. Another direction of generalization of XLFSR is to construct a - % - Q generator where two control bits are required for generating the output bit. 10

Acknowledgments This work was supported in parts by ITR, NSER, and Micronet. References [1]. Gollmann and W. G. hambers, lock-controlled shift registers: a review, IEEE J. SA., vol. 7, no. 4, pp. 525-533, May 1989. [2] T. Beth and F. Piper, The stop-and-go generator, in Eurocrypt 84, (LNS 209), Berlin: Springer-Verlag, 1985, pp. 88-92. [3]. G. Günther, Alternating step generators controlled by de Bruijn sequences, in EURO- RYPT 87, (LNS 304), Berlin: Springer-Verlag, 1988, pp. 5-14. [4]. oppersmith, H. Krawczyk, and Y.Mansour, The shrinking generator, in RYPTO 93, (LNS 773), Berlin: Springer-Verlag, 1994, pp.22-39. [5] J. J. Golić and M. V. Zivković, On the linear complexity of nonuniformly decimated PNsequences, IEEE Trans IT, vol. 34, no. 5, pp. 1077-1079, Sept. 1988. [6] J. J. Golić, Linear cryptanalysis of stream ciphers, in Proc. 2 - Int. Workshop on Fast Software Encryption, pp. 154-169, Leuven, Belgium, ec 1994. GF?M 8 [7] H. Wu and M. A. Hasan, Efficient exponentiation in using dual basis, in Proc of th Biennial anadian ommunication Symposium, Kingston, anada, 1996, pp. 204-207 [8] R. E. Ziemer and R. L. Peterson, igital ommunications and Spread Spectrum Systems, MacMillan Publishing company, New York, 1985. [9] R. A. Rueppel, Stream iphers, Springer-Verlag, Berlin, 1987. 11

[10] R. J. McEliece, Finite Fields for omputer Scientists and Engineers, Kluwer Academic Publishers, 1987. 12