Efficient Realization for A lass of lock-ontrolled Sequence Generators Huapeng Wu and M. A. Hasan epartment of Electrical and omputer Engineering, University of Waterloo Waterloo, Ontario, anada Abstract In this article, hardware implementation of the - sequence generator is discussed. A novel architecture for the - generator using an extended linear feedback shift register (XLFSR) is presented. ompared to the conventional LFSR based schemes, the proposed scheme is advantageous in the sense that it yields generators of high and constant throughput. When this scheme is used to implement generators in VLSI technologies, low area and power consumption are also expected. Moreover, it has been shown that the proposed 1-2 generators are very suitable for building long Gollmann s cascaded generators. Key Words: Sequence generator, LFSR, -sequence, nonuniform decimation, Gollmann s cascaded generator.
I. Introduction The stream cipher [9] is used in many cryptographic applications because it can operate at a very high data rate. The key component in a stream cipher system is the pseudorandom sequence generator. How to easily generate sequences which are good in the sense of cryptography has long been an interesting research area [9]. Linear Feedback Shift Register (LFSR) based sequence generators are attractive because of their conceptual simplicity and low implementation complexity. This type of generators include clock-controlled generators of which stop-and-go generator [2] and 1-2 generator (or step-1/step-2 generator) [1] are most common. The stop-and-go generator uses two LFSR s where the output of the first one is used to control the clock of the second LFSR. Therefore, an output bit 1 of the a first LFSR causes the second one to shift its state, while implies that the state of the second LFSR remains unchanged. The output of this second LFSR is then the output of the stop-andgo generator. It has been shown that the repeated bits in the output sequence of a stop-and-go - generator can lead an attacker to have better chances to succeed [6]. The generator tries to solve this problem by shifting the second LFSR once when the output bit of the first LFSR is 0, and shifting twice when the output bit of the first LFSR is 1. For a 1-2 generator, suppose that the original sequence and the control sequence are and,,, respectively, then the generated sequence is! # $&% " '(*),+.- (0/ #32 $&% 1 '(*),+ (0/ - 4 52 where 6 is the decimation sequence [1]. If is an -sequence of period 7 849 and has period: with;<>=@?6 $&% BA'),+ E, then the 1-2 generator yields the 1
sequence! of maximal period6 :. Moreover, if every prime factor of: divides6 then the linear complexity of! is no less than F: [1, 5]. Such generators can also be cascaded to obtain sequences with increasingly long period and high linear complexity [1]: The clock input of thegth register,gihj?egk9, is the sum (modulo 2) of the clock input of the th?eg.9 register and the output of the th register. Likewise the output of the cascaded generator is the sum of the clock input of the last register and the output of the last register. Assuming that there arel stages and the control sequence to the first stage is an -sequence of period6, then such a cascaded generator has a period of?m 8 9 ONQP % and linear complexity of R?M 8 9 onventional LFSR based implementations of 1-2 generators are to use an extra output buffer, or require two clocks or two LFSRs. As we shall show later, these schemes are not quite suitable for implementing long Gollmann s cascaded generators in applications where area is of prime concern. In this work, we propose a novel implementation of 1-2 generator. It has a low space complexity and a high and constant throughput. When it is implemented with VLSI technologies, it can potentially reduce the power consumption. It is also shown that the proposed scheme is especially suitable for building long Gollmann s cascaded generators. ON. The organization of this article is as follows. In Section II, a brief account of previous schemes for implementing 1-2 generators is given. Then we propose a new structure of the 1-2 generator in Section III. oncluding remarks are given in Section IV. II. onventional LFSR based Schemes From the definition of the 1-2 generator, if both the clock input and the control sequence have the same rate, the generator will have an irregular output bit rate which depends on the appearance of 1 s in the control sequence. That is, the generator outputs one bit per clock cycle when the control bit is 0 and one bit every two clock cycles when the control bit is 1. onsequently, it has two shortcomings: One is the irregularity of the output rate and the other is the reduced 2
throughput relative to the system clock input to the LFSR. To overcome these drawbacks, the following schemes can be used. A. Two clock scheme One method to overcome the problem of irregular throughput is to use two clocks where one clock has a rate half of the other s. When the control bit is 0 the slower clock is used as the input clock to the LFSR, and when the control bit is 1 the faster clock is used. A one-bit buffer is required for temporarily storing the output bit of the LFSR, and then the output of the generator can be clocked out from the buffer at the rate equal to the rate of the slower clock. Suppose that the slower clock signal is obtained from the system clock source using frequency-division method, the generator implemented in this way will suffer a low throughput which is only half of the system clock rate on the average. In this case, to match the generator throughput, the rate of the control sequence should also be sustained as half of the system clock rate. B. Output buffer scheme Another method to overcome the irregularity of the output rate and maintain a comparatively high throughput requires a multi-bit buffer at the output end of the LFSR. First the LFSR works with the controlled clock input to yield the required sequence which enters the buffer with an irregular rate. Then output bits of the generator are clocked out from the buffer at a slower rate after an initial delay. Here the output buffer functions as a filter for the output of the LFSR to yield the required sequence with a slower output rate. Obviously both the buffer size and generator throughput depend on the number of 1 s in one period of the control sequence, as well as the distribution and - length of 1-runs of the control sequence. If the generator is controlled by a periodic sequence with equal numbers of S s and s in one period, then it has an output rate which is of the system 3
clock rate. 1 In this case if the buffer s output clock rate is faster than S of the system clock rate, the buffer will eventually run out of the data; on the other hand, if the output clock is slower than this rate, the backlog in the buffer will be getting larger and larger and will eventually overflow. Our simulation results (see Figure 1) indicate that the buffer size increases rapidly when TH if the control sequence is an -sequence generated with an LFSR of the same length. 25 20 15 10 5 4 6 8 10 12 14 Figure 1: LFSR length vs the minimal buffer lengthu (output buffer scheme).. Two LFSR scheme - The generator implemented with the above two schemes yields a constant but lower throughput. One way to avoid this problem is to use two LFSRs [3]. We know that 2-decimation of an - sequence is still the same -sequence but with a different initial phase [10]. Then a 1-2 generator can be built with two identical LFSRs,V andw, both working at the input clock rate. LFSRW has a different initial state from LFSRV in that LFSRW yields the 2-decimation of the -sequence generated by LFSRV. The output sequence of the generator consists of the bits from both LFSRs. When the control bit is 0 the output bit of LFSRV is chosen as the output bit of the generator, otherwise the output bit of LFSRW is the output bit of the generator. Obviously, the generator has an output rate equal to the input clock rate. One disadvantage of this scheme is its relatively higher complexity which is more apparent when a cascaded generator is to be used. b -sequence is used as the control sequence, then the buffer must work at a clock rate preciously the input clock rate. arexzy\[^] andxzy@[]`_a 1 Note that there 1 s 0 s in one period of anb withcedfx>yg_ha -sequence XYkjlXZY@[]m_ia X Y. If such an of 4
P P % % P n p III. New 1-2 Generator Architecture A. XLFSR Given an LFSR with primitive characteristic polynomialno?0p initial state, the output is an -sequence and is given by [10] wheres is a root ofno?ep " Tr?rqs andq GF?M 8 From the above identity we have. 8 '),+ $&% n Tr?rqts $ 8 %u Tr?vqts 8 '),+ $&% P P n Tr?vqts $ 8 n 8 $&% Tr?vqts 2 8 ') $&% P % n $&% Tr?vqts $ 8 8 '),+xwn $&% } + 8 $&%yn 2? 9{z $&%Q~ Tr?rqs $ 8 Jp 8 2 8 '),+ $&% n P P and its nonzero } (3.1) wherez ifg is the Kronecker function which is 1 and 0 otherwise. It is little tricky to see that if a device can be built using (3.1) to produce %, then it will in effect generate a sequence decimating -sequence by 2. Such a device can be realized by an LFSR- style structure which is shown in Figure 2. We combine this structure and the original LFSR together and it yields a new LFSR as shown in Figure 3. This extended LFSR which is referred to as XLFSR can achieve two operations: ( ) generation of an -sequence and (! ) decimation of the -sequence by 2, depending on the positions of switchesƒ. A little more complicated version of the XLFSR has been proposed in [7] for finite field exponentiation where the register can be shifted in both directions. When the switches are at upper positions (dotted lines), the upper portion of the circuit is disconnected and the circuit is just a conventional LFSR generating the output bit. When the 5
P n^% n n n 8 $&% n% n n n 8 $&% ^ Figure 2: An LFSR to generate 2-decimation of an -sequence. n^% n n n 8 $&% K K K K n^% n n n 8 $&% Output Figure 3: XLFSR: to generate both 1-decimation and 2-decimation of an -sequence. switchesƒ are at lower positions (solid lines), the circuits are configured to perform decimationby-2 operations and yield the current output. The switchesƒ are controlled by the current bit % bit whenh7 in the control sequence: They are at upper position, and at the lower position whenˆ. The switch control circuitry is very simple and omitted from the figure. Obviously, if we use the bits of the control sequence to control the switches, the XLFSR will work exactly in the same way as a - generator does. 6
P B. omplexities of XLFSR Let f?rn denote the Hamming weight of the characteristic polynomialno?0p. Then the size complexity of the conventional LFSR with characteristic polynomialno?0p is f?mn 9Š XOR gates and -bit registers, while the corresponding XLFSR can be built with w f?mn 9 ~ XOR gates and 1-bit registers. The switches are very simple (three-state drivers) and we do not take them f?rn into consideration. Obviously, when is not very large the XLFSR does not significantly whennœ?ep increase complexity compared to the conventional LFSR. For instance, is a primitive pentanomial, the construction of the corresponding XLFSR requiress more XOR gates compared to that of the conventional LFSR. is w f?mn When the switches?vƒ are at the upper positions (dash lines), the time delay for generating 9 ~Ž: 2 :, where:m is the time delay of one XOR gate and:` denotes the time delay ( of a 1-bit register flip-flop). When the switches are at the lower positions (solid lines), the upper XOR gate network is connected and the total time delay for generating % is w {?Mn 9 ~*: 2 :. 3 only : 2 : ifno?0p Ifno?0p learly, the time complexity is is a primitive trinomial. is not a or f?rn trinomial, both the upper and lower XOR gate feedback networks can be implemented in full parallel form and the total time complexity becomes? 0š} ;?r f?rn the size complexity remains the same. 9œ O 2ž : 2 : while. High speed XLFSR For each Fibonacci type LFSR, there is a corresponding high speed Galois type LFSR that can produce the same output sequence [8]. A Galois type XLFSR can be derived from a Galois type LFSR in a similar way the XLFSR has been derived from the Fibonacci type LFSR. This Galois type XLFSR is shown in Figure 4. When used as a sequence generator, it can produce the same output sequence as a (Fibonacci type) XLFSR does. 7
n^% n n 8 $&% K K K K n^% n n 8 $&% Output Figure 4: High speed XLFSR. S Two clock Output buffer Two LFSR Proposed scheme scheme scheme scheme # of LFSRs a 1 1 2 1 (XLFSR) Throughput rate 1 1 Extra buffer bit yes none none # clock sources 1 2 1 1 omplexity of overall control small moderate small very small Initial delay very small yes none none Precomputation none yes small none a The LFSR generating the control sequence is not included here. Table 1: omparisons of the new scheme to three other schemes. An advantage of the Galois type XLFSR over Fibonacci type XLFSR is that the former does not cascade the XOR gates yielding a higher speed of operation, especially when no?ep is not a trinomial. The size complexity of the Galois type XLFSR is the same as the Fibonacci type XLFSR for anyno?ep.. omparisons omparisons between the proposed scheme and those discussed in Section II are shown in Table 1. In the table, the throughput is denoted by the average number of output bits per input clock cycle. 8
For example, means the throughput is 1 bit every two input clock cycles. In the two-clock scheme, one clock can be simply derived from the other by halving the frequency of the latter. While in the output-buffer scheme, the generation of the slower clock can be a little more complex. The precomputation required in the output buffer scheme includes the effort to decide the buffer size and the initial delay. Also note that the LFSR used in the proposed scheme is actually XLFSR which is a little more complex than the conventional LFSR. From Table 1 it is clear that the new scheme has advantages over the others in terms of space complexity, throughput, and simplicity of the overall control. E. ascaded XLFSR s The use of XLFSR to build Gollmann s cascaded generator is straightforward. OnlyL 2Ÿ XLFSR s are needed for anl -stage cascaded generator and the overall control is very simple. The system is clocked by a single clock source and the throughput of the generator is equal to the clock rate. A cascaded generator of two stages implemented using three XLFSRs is shown in Fig 5. The first XLFSR (far left) is used as an -sequence generator producing the control sequence for the first stage of the cascade. A delay block is used at each stage for aligning the input to the XLFSR of this stage with its immediate output and then both the input and output bits are added (mod 2) together to give the final output bit for this stage [1]. In this way we can build a cascade ofl stages where the binary input to theg th stage is used to control the XLFSR of this stage and is also added to the immediate output of the XLFSR to give the final output from this stage to be passed on as the NQP the?vg 2 N input to th stage. The period and linear complexity ofl -stage cascaded generators are?m 8{9 % and R?r 8{9, respectively [1]. A simple comparison ofl -stage cascaded generators built with different schemes is shown in Table 2. Note that in both two-clock and output-buffer schemes extra buffer is required between any two stages of LFSR and consequently extra delay occurs at every stage. 9
XLFSR XLFSR XLFSR Figure 5: Gollmann s cascaded generator built with XLFSR. Two clock Output buffer Two LFSR Proposed scheme scheme L 2ž scheme L 2ž scheme L 21 L 21 # of LFSRs a (XLFSR) S Throughput rate L 1 1 Extra buffer bits yes none none # clock sources L Initial delay clock cycles yes none none Precomputation none yes small none a The LFSR generating the control sequence is included. Table 2: omparisons of schemes to build anl -stage cascaded generator. IV. oncluding Remarks - In this article, we have presented a novel LFSR style structure for the generator. ompared to other conventional LFSR based schemes, the proposed scheme has the merits of high and constant throughput, no initial delay, no precomputation and simple overall control. It has been shown that the proposed scheme is very suitable for building long Gollmann s cascaded generators. The idea of XLFSR can be easily generalized to a class of LFSR style structures that can achieve both step- and step- - forward decimation. Practical implementation of such generator using this idea, however, may require complicated circuitry when S. Another direction of generalization of XLFSR is to construct a - % - Q generator where two control bits are required for generating the output bit. 10
Acknowledgments This work was supported in parts by ITR, NSER, and Micronet. References [1]. Gollmann and W. G. hambers, lock-controlled shift registers: a review, IEEE J. SA., vol. 7, no. 4, pp. 525-533, May 1989. [2] T. Beth and F. Piper, The stop-and-go generator, in Eurocrypt 84, (LNS 209), Berlin: Springer-Verlag, 1985, pp. 88-92. [3]. G. Günther, Alternating step generators controlled by de Bruijn sequences, in EURO- RYPT 87, (LNS 304), Berlin: Springer-Verlag, 1988, pp. 5-14. [4]. oppersmith, H. Krawczyk, and Y.Mansour, The shrinking generator, in RYPTO 93, (LNS 773), Berlin: Springer-Verlag, 1994, pp.22-39. [5] J. J. Golić and M. V. Zivković, On the linear complexity of nonuniformly decimated PNsequences, IEEE Trans IT, vol. 34, no. 5, pp. 1077-1079, Sept. 1988. [6] J. J. Golić, Linear cryptanalysis of stream ciphers, in Proc. 2 - Int. Workshop on Fast Software Encryption, pp. 154-169, Leuven, Belgium, ec 1994. GF?M 8 [7] H. Wu and M. A. Hasan, Efficient exponentiation in using dual basis, in Proc of th Biennial anadian ommunication Symposium, Kingston, anada, 1996, pp. 204-207 [8] R. E. Ziemer and R. L. Peterson, igital ommunications and Spread Spectrum Systems, MacMillan Publishing company, New York, 1985. [9] R. A. Rueppel, Stream iphers, Springer-Verlag, Berlin, 1987. 11
[10] R. J. McEliece, Finite Fields for omputer Scientists and Engineers, Kluwer Academic Publishers, 1987. 12