Montgomery Modular Exponentiation on Reconfigurable Hardware æ

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Montgomery Modular Exponentiation on Reconfigurable Hardware æ"

Transcription

1 Montgomery Modlar Exponentiation on Reconfigrable Hardware æ Thomas Blm Worcester Polytechnic Institte ECE Department Worcester, MA , USA Christof Paar Abstract It is widely recognized that secrity isses will play a crcial role in the majority of ftre compter and commnication systems. Central tools for achieving system secrity are cryptographic algorithms. For performance as well as for physical secrity reasons, it is often advantageos to realize cryptographic algorithms in hardware. In order to overcome the well-known drawback of redced flexibility that is associated with traditional ASIC soltions, this contribtion proposes arithmetic architectres which are optimized for modern field programmable gate arrays (FPGAs). The proposed architectres perform modlar exponentiation with very long integers. This operation is at the heart of many practical pblic-key algorithms sch as RSA and discrete logarithm schemes. We combine the Montgomery modlar mltiplication algorithm with a new systolic array design, which is capable of processing a variable nmber of bits per array cell. The designs are flexible, allowing any choice of operand and modls. Unlike previos approaches, we systematically implement and compare several variants of or new architectre for different bit lengths. We provide absolte area and timing measres for each architectre. The reslts allow conclsions abot the feasibility and time-space trade-offs of or architectre for implementation on Xilinx XC4000 series FPGAs. As a major practical reslt we show that it is possible to implement modlar exponentiation at secre bit lengths on a single commercially available FPGA. Introdction It is widely recognized that secrity isses will play a crcial role in many ftre compter and commnication systems. A central tool for achieving system secrity are æ The research was spported in part throgh an NSF CAREER award #CCR cryptographic algorithms. For performance as well as for physical secrity reasons it is often reqired to realize cryptographic algorithms in hardware. Traditional ASIC soltions, however, have the well-known drawback of redced flexibility compared to software soltions. Since modern secrity protocols are increasingly defined to be algorithm independent, a high degree of flexibility with respect to the cryptographic algorithms is desirable. A promising soltion which combines high flexibility with the speed and physical secrity of traditional hardware is the implementation of cryptographic algorithms on reconfigrable devices sch as FPGAs and EPLDs. In the case of pblic-key schemes, algorithm independence can mean not only a change of the actal crypto algorithm bt also change of parameters sch as bit length, modls, or exponents. This contribtion deals with arithmetic architectres for modlar exponentiation with very long integers which is at the heart of most modern pblic-key schemes. Most notably, both RSA and discrete logarithm-based (e.g., Diffie-Hellman key exchange or the Digital Signatre Algorithm, DSA) schemes reqire modlar long nmber exponentiation. The challenge at hand is to design sch arithmetic architectres for operands with p to 024 bit on crrent FPGAs. The very long word lengths prohibit the application of many proposed architectres as they wold reslt in nrealistically large resorce reqirements. In this contribtion we derive a modlar exponentiation architectre which combines Montgomery s modlar redction scheme and a novel systolic array architectre. The systolic array architectre reqires considerably fewer logic resorces than many other systolic array architectres for modlar arithmetic. This is crcial, as one of or goals was to derive soltions that can fit into a single FPGA. Clearly a design which fits in a single FPGA has many cost and design advantages over mlti FPGA soltions. Another important objective was to systematically implement varios architectre options for different bit lengths. This contribtion is strctred as follows. In Section 2, we smmarize some of the previos work on modlar ex-

2 ponentiation. Section 3 describes algorithms for modlar exponentiation and mltiplication and some simplifications and speed-ps for their hardware implementation. In this section we also describe some of the relevant featres of the Xilinx XC4000 FPGA series. Section 4 otlines or architectre for modlar exponentiation. Section 5 briefly describes or methodology and tools that were sed for this research. Section 6 of this contribtion posts the timing and area reslts obtained. A comparison to other architectres and an otlook conclde this contribtion. 2 Previos Work In the following, we will smmarize relevant previos work in the field of modlar mltiplication. Most proposed approaches are based on Montgomery s algorithm [0], either in conjnction with a redndant nmber representation or in an systolic array architectre. Soltions sing other algorithms have also been presented. To avoid the carry propagation in mltiplication/addition architectres several soltions have been proposed in the literatre. They either se Montgomery s algorithm, in combination with a redndant radix nmber system [5, 3, 7, 4, 8, 6] or the Reside Nmber System []. The Research Laboratory of Digital Eqipment Corp. in Paris implemented modlar exponentiation in architectres on FPGAs [7, 3]. They tilized an array of 6 XILINX 3090 FPGAs. Compared to XILINX 4000 series in terms of flip flops, this is eqivalent to a chip with 500 configrable logic blocks (CLBs). In terms of logic resorces this is eqivalent to a chip of 4000 CLBs. In their work they sed several speed-p methods [3] inclding the Chinese remainder theorem, an asynchronos carry completion adder, and a windowing method. The implementation comptes a 970bit RSA decryption at a rate of 85kb/s (5.2ms per 970 bit decryption) and a 52 bit RSA decryption in excess of 300 kb/s (.7ms per 52 bit decryption). A drawback of this soltion is that the binary representation of the modls is hardwired into the logic representation so that the architectre has to be reconfigred with every new modls. There has been a nmber of proposals for systolic array architectres for modlar arithmetic. However, no implementations have been reported to or knowledge. In [5] a VLSI soltion is presented where a modlar mltiplication is calclated in è4n +èæ 3n=2 clock cycles (n is the nmber of bits of the modls). That is approximately for times more cycles than in a conventional soltion. In terms of resorces this design wold be sitable for FPGA. Similar two-dimensional systolic arrays are presented in [7, 9, 20, 6]. For a radix of two they all propose an n æ n matrix of one bit processing elements. With this configration 2n modlar mltiplications are calclated at the same time and the theoretical throghpt is one modlar mltiplication per clock cycle. In terms of resorces, sch a soltion is not feasible in either VLSI or FPGA for the bit length reqired in pblic-key algorithms. Even implementing only onerowofprocessingelements, (resltinginn times slower throghpt) into presently available FPGAs is difficlt in terms of resorces. We tried to overcome the shortage of resorces per chip by sing larger processing elements and ths saving overhead. Reference [2] provides a good overview of previosly presented architectres for VLSI implementations of modlar integer arithmetic. Reference [3] smmarizes the chips available in 990 for performing RSA encryption. More recently an approach [23] has been presented that tilizes precompted complements of the modls and is based on the iterative Horner s rle. Compared to Montgomery s algorithms these approaches se the most significant bits of an intermediate reslt to decide which mltiples of the modls to sbtract. The drawback of these soltions is that they either need a large amont of storage space or many clock cycles to complete a modlar mltiplication. The athors attempted to overcome the later problem by a higher clock freqency which is possible de to a simplified modlo redction operation. 3 Preliminaries 3. Modlar Exponentiation and RSA We start this section with a short description of the RSA algorithm, proposed by Rivest, Shamir and Adleman [2] in 978. The algorithm is based on modlar exponentiation of integers. The private key of a ser consists of two large primes p and q and an exponent D. The pblic key consists of the modls N = p æ q and an exponent E sch that E = D, mod èp, èèq, è. In the remainder of the article we always assme that N can be represented by n bits. To encrypt a message X the ser comptes: Y = X E mod N Decryption is done by calclating: X = Y D mod N The identical operations are tilized for the RSA digital signatre scheme. In order to thwart crrently known attacks, the modls N and ths X and Y shold have a length of bits. Both encryption and decryption reqire algorithms for compting a modlar exponentiation. This can be realized by sing the sqare and mltiply algorithm [4]. To compte sqaring and mltiplication in parallel we can se the following version [20]: 2

3 Algorithm : comptes P = X E mod N, where E = P n, i=0 e i2 i, e i 2f0; g. P 0 =, Z 0 = X 2. for i = 0 to n, do 3. Z i+ = Z 2 i mod N 4. if e i =then P i+ = P i æ Z i mod N Algorithm takes 2n operations in the worst case and :5n on average. For speeding p encryption the se of a short exponent E has been proposed [8]. Recommended by ITU is the the Fermat prime F 4 = Using F 4, the encryption is exected in only 7 operations. Other short exponents proposed inclde E =3and E =7. Obviosly the same trick can not be sed for decryption, as the decryption exponent D mst be kept secret. Bt sing the knowledge of the factors of N = q æ p, thechinese Remainder Theorem [] can be applied by the decrypting party. Two n=2 size modlar exponentiations and an additional recombination instead of one n size modlar exponentiations are compted in this case. Each modlar exponentiation of length n=2 takes =4 of the time reqired for an n bit exponentiation. If both exponentiations are performed serially, an over all speed p factor of two is achieved. If they are performed in parallel, a speed p factor of for is achieved. 3.2 Montgomery Modlar Mltiplication As shown in the previos section, modlar exponentiation is redced to a series of modlar mltiplications and sqarings. The algorithm for modlar mltiplication described below has been proposed by P. L. Montgomery in 985 [0]. Several optimizations were taken from reference [9]: Algorithm 2: Montgomery Modlar Mltiplication (radix 2) for compting A æ B mod N, where B = P P n+ b i=0 i2 i n+2, b i 2f0; g, b 0 = 0, A = a i=0 i2 i, a i 2 f0; g, a n+ =0, a n+2 =0. R 0 =0 2. for i =0to n +2do 3. q i = R i è0è 4. R i+ =èr i + a i æ B + q i æ N è=2 B is shifted p one bit with b 0 =0. This measre simplifies the comptation of q i, compared to the original algorithm. The loop of Algorithm 2 is exected three more times than originally proposed. With this step we make sre the ineqalities R i é 3N and R n+3 é 2N always hold. The reslt of a modlar mltiplication R n+3 can ths be resed as inpt A and B for the next mltiplication. We avoid the originally proposed final comparison and sbtraction and make a pipelined exection of the algorithm possible. A precondition for the algorithm to work is that the modls N has to be relatively prime to the radix. In RSA this is always satisfied as N is a mltiple of two primes and therefore odd. The algorithm above calclates R n = è2,n,3 ABè modn. To get the right reslt we need an extra Montgomery modlar mltiplication by 2 2n+6 mod N. However if frther mltiplications are reqired as for exponentiation it is better to pre mltiply all inpts by the factor 2 2n+6 mod N. Ths every intermediate reslt carries a factor 2 n+3. We jst need to Montgomery mltiply the reslt bytoeliminatethatfactor. The final Montgomery mltiplication with makes sre or final reslt is smaller than N. Consider Algorithm 2 with Bé 4N (B shifted p) and A =è0;:::;0; è. We will get R = B=2 é 2N. As all remaining a i =0,we getatmostr i+ =èr i + N è=2! N. If only one q i =0 èi =; 2 :::n+2è,thenr i+ = R i =2 én(probability:, 2,èn+2è ). The whole comptational complexity of Algorithm 2 lies in the three additions of n bit operands for compting R i+. As the propagation of n carries is too slow and an eqivalent carry look ahead logic reqires too many resorces, two different strategies have been prsed in the literatre:. Redndant representation: The intermediate reslts are kept in redndant form. Resoltion into binary representation is only done at the very end and for feeding the intermediate reslt back as a i in Algorithm Systolic Arrays: n processing nits calclate bit per clock cycle. The compted carries, q i and a i are pmped throgh the processing nits. As these signals have to be distribted only between adjacent processing nits, a faster clock speed and a reslting higher throghpt shold be possible. The cost is a higher latency and possibly more resorces. 3.3 Xilinx XC4000 Series FPGAs In this section we present some of the relevant featres of the Xilinx XC4000 Series FPGAs and introdce a metric for FPGA cost and performance evalation. An FPGA device consists of three types of reconfigrable elements, the Configrable Logic Blocks (CLBs), I/O blocks (IOBs) and roting resorces [22]. An XC4000 CLB is made p of 3 look p tables, two flip-flops and programmable mltiplexers. Any boolean fnction of 5 inpts, 3

4 any 2 fnctions of 4 inpts and some fnctions of p to 9 inpts can be compted in one CLB. The mltiplexers can rote these signals directly to the otpts or to the flip-flops. In the first case the flip-flops can be tilized to store direct inpts. Programmable roting resorces connect the CLBs and IOBs into a network. For signal distribtion all over the device there are 8 global nets available. Another featre of the CLB is its dedicated hardware to accelerate the carry path of adders and conters [22]. An n bit ripple carry adder is implemented in n=2 +2CLBs. As the carry signal ses dedicated interconnects, there is no roting delay in the path and the total delay is fixed: t pd =4:5+n æ 0:35 ënsë. On chip RAM redces the cost of data storage. A single CLB can be sed for a 6 æ 2 bit or 32 æ bit ROM/RAM or for a 6 æ bit Dal Port RAM. In previos work [20, 9, 4] the gate cont model has been sed for cost evalation and the gate delay model for speed evalation. This is not appropriate for FPGAs. As the fnctional nit of an FPGA is the CLB, we evalate the cost (C) in nmber of CLBs. The operation time (T) consists of logic delay in the CLBs and roting delay and is obtained from Xilinx s Timing Analyzer software. As a third parameter we se the time area prodct (TA). It is defined by time mltiplied by cost. 4 A New Architectre 4. Design Overview As described in Section 3.2, there have been two principle approaches proposed to compte Montgomery modlar mltiplication. A soltion following approach has already been implemented in FPGA [7]. The second approach sing systolic arrays has drawn considerable attention in the research commnity. However, no architectres that specifically target FPGAs have been reported, nor are there reports of implementations of sch systolic architectres. Or contribtion targets these two goals. Or system can be divided hierarchically into three levels.. Processing Element: Compte bits of a modlar mltiplication. 2. Modlar Mltiplication: An array of processing elements comptes a modlar mltiplication. 3. Modlar Exponentiation: Combine modlar mltiplications to modlar exponentiation according to Algorithm. In the following we describe the system with a bottom p approach. 4.2 Processing Elements A general radix 2 systolic array as proposed in [7, 9, 6, 5] tilizes n times n processing elements. As this approach wold reslt in nrealistically large CLB conts for the bit length reqired in modern pblic key schemes, we implemented only one row of processing elements. To frther redce the reqired nmber of CLBs we implemented processing elements (nits) of =4,8,6 bits. Withthisapproach we need onlyn= instead of n processing elements, and a considerable amont of overhead can be saved. Similar to the approach in [9] we compte sqarings and mltiplications of Algorithm in parallel. As explained in Section 4.3, this measre flly tilizes every cycle. Mx_B B_Reg "0" B+N_Reg N_Reg B_In N_In Res_0_In Mx_ -bit Adder + Add_Reg - Add_Reg_2 2 "0" Mx_2 Control - Figre. Processing Element (nit) Decode Mx_Res Control_Reg q_i, a_i-reg Reslt_Reg Control_Ot q_i, a_i-in 2 q_i, a_i-ot Carry_In Reslt_Ot Reslt_In Carry_Ot Res_0_Ot In the processing elements we need the following registers: æ N-Reg ( bits): storage of the modls æ B-Reg ( bits): storage of the B mltiplier æ B+N-Reg ( bits): storage of the intermediate reslt B + N æ Add-Reg ( +bits): storage of the intermediate reslt æ Add-Reg-2 (, bits): storage of the intermediate reslt æ Control-Reg (3 bits): control of the mltiplexers and clock enables æ a i,q i (2 bits): mltiplier A, qotient Q, according to Algorithm 2 æ Reslt-Reg ( bits): storage of the reslt at the end of a mltiplication 4

5 The registers need a total of è6 +5è=2 CLBs. Instead of compting èr + a i æ B + q i æ N è=2 in each iteration, we compte N + B once and store the reslt in the B+N-Reg. Mltiplexer Mx selects one of its inpts 0, N, B, B + N to be added to R according to the vale of the binary variables a i and q i. The additional cost is a bitregister,a slightly more complicated mltiplexer Mx, and two more clock cycles per mltiplication. The advantage is that only a two operand adder is needed that can be implemented with the ripple carry adder optimized for the Xilinx XC4000 series (see Section 3.3). Also we need only one carry instead of two between nits. The carry propagation delay of a 6 bit adder is eqivalent to only one additional CLB delay. The adder can be combined into the CLBs of the Add-Reg; we need therefore no additional CLBs. An additional register Add-Reg-2 allows storage of a mltiplication while a sqaring is compted and vice versa. The decoded control register signals and the a i, q i signals control the mltiplexers Mx B, Mx, Mx 2, Mx Res and the clock enables of the registers. N-Reg is loaded only when the modls is changed, B-Reg and B+N-Reg after each completion of Algorithm 2. Mx feeds 0, B, N or B + N into the adder according to the a i and q i bits. Mx 2 feeds N (for calclation of N +B)orthe, most significant bits of the reslt pls the least significant reslt bit of the next nit (division by two / shift right) back into the adder. Mx Res selects either the reslt of this nit or the one to the left to be stored into Reslt-Reg. Theoretically the implementation of the mltiplexers and decoders wold cost additional 4 +4CLBs. The possibility of re sing registers for combinatorial logic allows some savings of CLBs. Mx B and Mx Res are implemented in the CLBs of B-Reg and Reslt-Reg, Mx and Mx 2 partially in N-Reg and B+N-Reg. The resltingcostsare approximately 3+4 CLBs per bit processing nit. We compare this expense to the resorces needed for a one bit nit implementation. The B + N register wold not be needed, as a ripple carry adder for sch a small adder makes no sense. We wold need a total of seven bit register space (N, B, a i, q i, control(2) and reslt) and a 4-bit inpt 3 bit otpt (2 carries, reslt) adder. Together with one or two CLBs for decoding the control word and mltiplexing, we wold have a total of 6 or 7 CLBs per nit. A device that spports sch a 024 bit implementation wold need 6:5æ 0 3 to 7:5æ 0 3 CLBs, inclding overhead. 4.3 Modlar Mltiplication Figre 2 shows how the processing elements are connected to an array for compting an n bit modlar mltiplication. Starting at the rightmost nit 0, the control word, a i,andq i are fed into their registers. The adder comptes Add-Reg-2 pls B/N/B +N in one clock cycle according to N_In B_In q_i, a_i-in Carry_In Res_0_Ot Reslt_Ot Unit_(n/) Units_(n/-)..2 N_In Control_ot q_i, a_i-ot Carry_Ot Res_0_In Reslt_In B_In Unit_ N_Bs B_Bs q_i, a_i-in Carry_In Res_0_Ot Reslt_Ot N_In B_In Control_ot q_i, a_i-ot Carry_Ot a_i-in q_i-in Res_0_In Res_0_Ot Reslt_In Reslt_Ot Unit_0 Figre 2. Systolic Array for modlar mltiplication N_In B_In a_in Reslt_Ot a i and q i. The least significant bit of the reslt is read back as q i+ for the next comptation. The reslting carry bit, the control word, a i and q i are pmped into the nit to the left, where the same comptation takes place in the next clock cycle. In sch a systolic fashion the control word, a i, q i, and the carry bits are pmped from right to left throgh the whole nit array. The division by two in Algorithm 2 leads also to a shift right operation. The least significant bit of a nit s addition (Res 0 ) is always fed back into the nit to the right. After a modlar mltiplication is completed, the reslts are pmped from left to right throgh the nits and consectively stored in RAM for frther processing. A single processing element comptes bits of R i+ = èr i + a i æ B + q i æ N è=2 of Algorithm 2. In clock cycle i, nit 0 comptes bits 0 :::, of R i.incyclei +, nit ses the reslting carry and comptes bits :::2, of R i. Unit 0 ses the right shifted (division by 2) bit of R i (Res 0 ) to compte bits 0 :::, of R i+ in clock cycle i +2. Clock cycle i +is nprodctive in nit 0 while waiting for the reslt of nit. This inefficiency is avoided by compting sqares and mltiplications in parallel according to Algorithm 2. Both p i+ and z i+ depend on z i.wetherefore store the intermediate reslt z i in the B Registers and feed z i and p i into the a i inpt of the nits for sqaring and mltiplication. 4.4 Modlar Exponentiation Figre 3 shows how the array of nits is tilized for modlar exponentiation. First, the exponent E and the pre comptation factor 2 2n+6 mod N are read from I/O and stored into RAM (Exp and Prec). Then the modls N is read from I/O and fed on the bit wide N bs to the N registers of the nits. These steps have to be exected only if the system parameters need to be changed. Next we read the X vale from I/O, bits per clock cycle, and store it into the dal port (DP) RAM Z. Atthesame time the precomptation factor 2 2n+6 mod N is read from Prec RAM and fed bits per clock cycle via the B bs to the B registers of the nits. 5

6 X_In N_In Prec_In E_In N_In Units_(n/)...0 B_In a_i-in DP RAM X Reslt_Ot Shift X TDM Prec RAM DP RAM Z State machine Exp RAM Shift Z Figre 3. Design for a modlar exponentiation Exection of Algorithm begins in parallel to the reading of X. Initially we have P 0 = and Z 0 = X. First we mltiply both vales by the pre comptation factor 2 2n+6 mod N. This is done by time mltiplexing X and ; 0 :::0 in the time division mltiplexing nit (TDM), pmping the reslt as a i into the nits and mltiplying it by 2 2n+6 mod N that is already stored in the B registers. The reslts of the two pre comptations are stored into DP RAM Z and DP RAM P. Sqaring is now straightforward: The intermediate reslt Z i is always stored into the B registers and into DP RAM Z and fed via a i back into the nits. Mltiplication is done almost the same way. P i+ is always compted by feeding P i into the nits, bt the reslt is stored into DP RAM P only if the exponent e i is eqal to. In this way always the last stored P i is pmped back into the nits. To eliminate the factor 2 n+3 (see Section 4.3) from the reslt P n, we compte a final Montgomery mltiplication with inpts P n and. 0; 0;:::0; is stored via the B bs into the B registers, P n is fed from DP RAM P as a i into the nits. A fll modlar exponentiation is compted in 2èn + 2èèn+4èclock cycles. That is the delay it takes from inserting the first bits of X into the device ntil the first reslt bits appear at the otpt. At that point, another X vale can enter the device. With a latency of n= clock cycles the last bits appear on the otpt bs. 5 Methodology In or implementation we adopted the following design flow approach that reslted in fast verification of gate level netlists as well as back annotated designs:. Design entry 2. Logic verification 3. Synthesis 4. Place and Rote 5. Timing Verification The entire design, with the exception of vendor specific soft macros, was entered in VHDL format. Once the design was developed in VHDL, boolean logic and major timing errors were verified by simlating the gate level description with Synopsys VHDL analyzer (vhdlan) and VHDL debgger (vhdldbx) version The next step involved the synthesis of the VHDL code with Synopsys Design Compiler (fpga analyzer) version The otpt of this step was an optimized netlist describing the gate level design in XILINX format. The most time consming step was the compilation of the synthesized design with the place and rote tools available from Xilinx. This process was accomplished with the XILINX Design Manager tools version M.5.9. The final step of the design flow was to verify the design once again bt this time with the physical net, CLB, and pad delays introdced when the design was placed into a specific device. This was accomplished with the same test benches and simlation models that were sed dring the logic verification stage. Synopsys (vhdldbx) was sed once again to verify back-annotated designs. The timing reslts from Section 6 were all compted by the Xilinx timing analyzer and verified by the Synopsis vhdl debgger. They were not verified with an actal chip. 6 Reslts 6. Modlar Exponentiation We implemented or design for varios bit lengths and nit widths. Table shows or reslts in terms of sed CLBs (C), clock cycle time (T) and the time area prodct (TA). 256 bit 52 bit C T TA C T TA [CLBs] [ns] [CLB æ ns] [CLBs] [ns] [CLB æ ns] bit 024 bit C T TA C T TA [CLBs] [ns] [CLB æ ns] [CLBs] [ns] [CLB æ ns] Table. CLB sage, minimal clock cycle time, and time area prodct of modlar exponentiation architectres on Xilinx FPGAs 6

7 The majority of CLBs is expended in the nits. In Section 4.2 we derived an approximation of 3 +4CLBs per nit. The overhead consists mainly of RAM, dal port RAM, shift registers, conters and the state machine. An n bit RAM is implemented in n=32 CLBs, a dal port RAM in n=6 CLBs. Conters and their decoding for addressing RAM and dal port RAM are more costly for larger designs. On the other hand, we sed the same state machine for all designs in Table. The clock cycle time T in Table is the propagation delay from B-Reg throgh Mx and the carries of the adder to the registered carry, pls the setp time of the flip-flop. We compare this delay to the optimal cycle time calclated by the Xilinx timing analyzer; for a 4 bit nit the delay with optimal roting is 0.5ns (256 and 52 bit designs) and 2.7ns (768 and 024 bit designs); for an 8 bit nit.2ns and 3.7ns and for a 6 bit nit 2.8ns and 5.5ns. The larger designs were implemented in larger FPGA devices featring different delay specifications. Otherwise we expect the same cycle times for designs with the same nit size. The additional roting delay is between 50% and 80% above the optimal propagation delay. For designs p to 768 and 024 ( =4) bits it remains approximately constant; it deteriorates for 024 bit designs with nit sizes =8and = 6. The same can be said abot the place and rote time: we experienced rn times of a cople of hors on a AMD K6 2/300 MHz PC for designs p to 768 and 024 ( =4) bits, p to a week for the 024 ( =8and =6) bit designs. Different design methods, sch as hard macros for a single nit, wold probably improve roting delay and place and rote time. The time area prodct shows that designs with 8 bit nits are generally most efficient. 52 bit 768 bit 024 bit C T C T C T CLBs [ms] CLBs [ms] CLBs [ms] Table 2. CLB sage and exection time for a fll modlar exponentiation Table 2 shows the application of or reslts to pblic key schemes where the Chinese remainder theorem cannot be applied. A fll modlar exponentiation with an n bit exponent is compted in 2èn + 2èèn +4èclock cycles. 6.2 Application to RSA Table 3 shows or reslts from the tables above, applied to RSA. The encryption time is calclated for the F 4 exponent, reqiring 2 æ 9èn +4èclock cycles. Using the F 4 exponent, only one mltiplication can be calclated in parallel to a sqaring. 52 bit 024 bit C T C T CLBs [ms] CLBs [ms] Table 3. Application to RSA: Encryption For decryption we apply the Chinese remainder theorem. We either decrypt n bits with an n=2 bit architectre serially, or with two n=2 bit architectres in parallel. The first approach ses only half as many resorces, the later is twice as fast. 52 bit 52 bit 024 bit 024 bit 2 æ 256 serial 2 æ 256 parallel 2 æ 52 serial 2 æ 52 parallel C T C T C T C T CLBs [ms] CLBs [ms] CLBs [ms] CLBs [ms] Table 4. Application to RSA: Decryption 6.3 Comparison and Otlook We compare or fastest RSA 52/024 bit designs of Table 4 to the fastest soft- and hardware soltions we fond in the literatre [7, 3, 2]. Or 2.37ms decryption time is abot for times faster than the 52 bit software implementation (9.ms) on a 50MHz Alpha [3]. The fastest 024 bit software implementation [2] of 43.3ms rnning on a PPro 200 based PC is abot 4 times slower than or best reslt (0.2ms). The fastest reported hardware design [7] (.7ms for a 52 bit modls and 5.2ms for a 970 bit modls) is a factor.4/.7 faster than ors (9.ms for a 970 bit modls). A drawback of the soltion in [7] is, however, that the binary representation of the modls is hardwired into the logic representation so that the architectre has to be reconfigred with every new modls. The ser of sch an implementation needs to own the fll development tools for synthesis, placing and roting of FPGAs, if RSA with different modli shold be exected. Or design stores the modls, the exponent and the pre comptation factor in registers and RAM. A second advantage of or design is that it is implemented into one device instead of a matrix of 6 devices. Using crrently available FPGA technology, however, the design [7] wold probably also fit in a single device. 7

8 To improve or design in terms of speed, three approaches can be taken:. Comptation of one bit per processing nit (25% improvement estimated). 2. Montgomery mltiplication with a radix r =2 ; ç 2. Comptation of a fll modlar exponentiation in Oèn 2 =è cycles instead of Oèn 2 è. Both approaches have the major disadvantage that considerably more resorces will be sed. We will concentrate or ftre research on trying to implement a higher radix design according to approach 3). The challenge at hand is to accommodate simplifications as proposed in [6] to systolic array and FPGA technology. References [] J. Bajard, L. Didier, and P. Kornerp. An RNS Montgomery modlar mltiplication algorithm. IEEE Transactions on Compters, 47(7):766 76, Jly 998. [2] T. Beth and D. Gollmann. Algorithm engineering for pblic key algorithms. IEEE Jornal on Selected Areas in Commnications, 7(4):458 65, May 989. [3] E. Brickell. A srvey of hardware implementations of RSA. In Advances in Cryptology CRYPTO 89, pages Springer-Verlag, 990. [4] S. E. Eldridge and C. D. Walter. Hardware implementation of Montgomery s modlar mltiplication algorithm. IEEE Transactions on Compters, 42(6): , Jly 993. [5] W. Gai and H. Chen. A systolic linear array for modlar mltiplication. In 2nd International Conference on ASIC, pages 7 4, 996. [6] H.Orp. Simplifying qotient determination in high-radix modlar mltiplication. In Proceedings 2th Symposim on Compter Arithmetic, pages 93 9, 995. [7] K. Iwamra, T. Matsmoto, and H. Imai. Montgomery modlar-mltiplication method and systolic arrays sitable for modlar exponentiation. Electronics and Commnications in Japan, Part 3, 77(3):40 5, March 994. [8] D. Knth. The Art of Compter Programming. Volme 2: Seminmerical Algorithms. Addison-Wesley, Reading, Massachsetts, 2nd edition, 98. [9] P. Kornerp. A systolic, linear-array mltiplier for a class of right-shift algorithms. IEEE Transactions on Compters, 43(8):892 8, Agst 994. [0] P. Montgomery. Modlar mltiplication withot trial division. Mathematics of Comptation, 44(70):59 2, April 985. [] J. Qisqater and C. Covrer. Fast decipherment algorithm for RSA pblic key cryptosystem. Electronics Letters, 8:905 7, October 982. [2] R. Rivest, A. Shamir, and L. Adleman. A method for obtaining digital signatres and pblic key cryptosystems. Commnications of the ACM, 2(2):20 6, Feb [3] M. Shand and J. Villemin. Fast implementations of RSA cryptography. In Proceedings th IEEE Symposim on Compter Arithmetic, pages , 993. [4] D. R. Stinson. Cryptography, Theory and Practice. CRC Press, 995. [5] N. Takagi. A radix-4 modlar mltiplication hardware algorithm efficient for iterative modlar mltiplications. In Proceedings 0th IEEE Symposim on Compter Arithmetic, pages 35 42, 99. [6] A. Tiontchik. Systolic modlar exponentiation via Montgomery algorithm. Electronic Letters, 34(9):874 5, April 998. [7] J. Villemin, P. Bertin, D. Roncin, M. Shand, H. Toati, and P. Bocard. Programmable active memories: Reconfigrable systems come of age. IEEE Transactions on VLSI Systems, 4():56 69, Mar 996. [8] C. Walter. Fast modlar mltiplication sing 2-power radix. International Jornal of Compter Mathematics, 39( 2):2 8, 99. [9] C. Walter. Systolic modlar mltiplication. IEEE Transactions on Compters, 42(3):376 8, March 993. [20] P. Wang. New VLSI architectres of RSA pblic key cryptosystems. In Proceedings of 997 IEEE International Symposim on Circits and Systems, volme 3, pages , 997. [2] E. D. Win, S. Mister, B. Preneel, and M. Wiener. On the performance of signatre schemes based on elliptic crves. In Algorithmic Nmber Theory Symposim III, pages Springer-Verlag, 998. [22] Xilinx Inc., San Jose, CA. The Programmable Logic Data Book [23] J. Yong-Yin and W. Brleson. VLSI array algorithms and architectres for RSA modlar mltiplication. IEEE Transactions on VLSI Systems, 5(2):2 7, Jne

MINIMED 640G SYSTEM^ Getting Started. WITH THE MiniMed 640G INSULIN PUMP

MINIMED 640G SYSTEM^ Getting Started. WITH THE MiniMed 640G INSULIN PUMP MINIMED 640G SYSTEM^ Getting Started WITH THE MiniMed 640G INSULIN PUMP let s get started! Table of Contents Section 1: Getting Started... 3 Getting Started with the MiniMed 640G Inslin Pmp...3 1.1 Pmp

More information

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath Objectives Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath In the previous chapters we have studied how to develop a specification from a given application, and

More information

Pipelining. Improve performance by increasing instruction throughput Program execution order. Data access. Instruction. fetch. Data access.

Pipelining. Improve performance by increasing instruction throughput Program execution order. Data access. Instruction. fetch. Data access. Chapter 6 Pipelining Improve performance by increasing instrction throghpt Program eection order Time (in instrctions) lw $, ($) Instrction fetch 2 4 6 8 2 4 6 8 ALU Data access lw $2, 2($) 8 ns Instrction

More information

High Performance Carry Chains for FPGAs

High Performance Carry Chains for FPGAs High Performance Carry Chains for FPGAs Matthew M. Hosler Department of Electrical and Computer Engineering Northwestern University Abstract Carry chains are an important consideration for most computations,

More information

A Fast Constant Coefficient Multiplier for the XC6200

A Fast Constant Coefficient Multiplier for the XC6200 A Fast Constant Coefficient Multiplier for the XC6200 Tom Kean, Bernie New and Bob Slous Xilinx Inc. Abstract. We discuss the design of a high performance constant coefficient multiplier on the Xilinx

More information

Examples of FPLD Families: Actel ACT, Xilinx LCA, Altera MAX 5000 & 7000

Examples of FPLD Families: Actel ACT, Xilinx LCA, Altera MAX 5000 & 7000 Examples of FPL Families: Actel ACT, Xilinx LCA, Altera AX 5 & 7 Actel ACT Family ffl The Actel ACT family employs multiplexer-based logic cells. ffl A row-based architecture is used in which the logic

More information

OF AN ADVANCED LUT METHODOLOGY BASED FIR FILTER DESIGN PROCESS

OF AN ADVANCED LUT METHODOLOGY BASED FIR FILTER DESIGN PROCESS IMPLEMENTATION OF AN ADVANCED LUT METHODOLOGY BASED FIR FILTER DESIGN PROCESS 1 G. Sowmya Bala 2 A. Rama Krishna 1 PG student, Dept. of ECM. K.L.University, Vaddeswaram, A.P, India, 2 Assistant Professor,

More information

CDA 4253 FPGA System Design FPGA Architectures. Hao Zheng Dept of Comp Sci & Eng U of South Florida

CDA 4253 FPGA System Design FPGA Architectures. Hao Zheng Dept of Comp Sci & Eng U of South Florida CDA 4253 FPGA System Design FPGA Architectures Hao Zheng Dept of Comp Sci & Eng U of South Florida FPGAs Generic Architecture Also include common fixed logic blocks for higher performance: On-chip mem.

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY Tarannum Pathan,, 2013; Volume 1(8):655-662 INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK VLSI IMPLEMENTATION OF 8, 16 AND 32

More information

770pp. THEORIA 64 (2009)

770pp. THEORIA 64 (2009) DOV M. GABBAY AND JOHN WOODS: The Rise of Modern Logic: From Leibniz to Frege. [Handbook of the History of Logic, vol. 3]. Elsevier North Holland, Amsterdam, 2004, 770pp. This volme contains essays on

More information

Laboratory 1 - Introduction to Digital Electronics and Lab Equipment (Logic Analyzers, Digital Oscilloscope, and FPGA-based Labkit)

Laboratory 1 - Introduction to Digital Electronics and Lab Equipment (Logic Analyzers, Digital Oscilloscope, and FPGA-based Labkit) Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6. - Introductory Digital Systems Laboratory (Spring 006) Laboratory - Introduction to Digital Electronics

More information

Optimization of memory based multiplication for LUT

Optimization of memory based multiplication for LUT Optimization of memory based multiplication for LUT V. Hari Krishna *, N.C Pant ** * Guru Nanak Institute of Technology, E.C.E Dept., Hyderabad, India ** Guru Nanak Institute of Technology, Prof & Head,

More information

WINTER 15 EXAMINATION Model Answer

WINTER 15 EXAMINATION Model Answer Important Instructions to examiners: 1) The answers should be examined by key words and not as word-to-word as given in the model answer scheme. 2) The model answer and the answer written by candidate

More information

Modeling Digital Systems with Verilog

Modeling Digital Systems with Verilog Modeling Digital Systems with Verilog Prof. Chien-Nan Liu TEL: 03-4227151 ext:34534 Email: jimmy@ee.ncu.edu.tw 6-1 Composition of Digital Systems Most digital systems can be partitioned into two types

More information

Computer Architecture and Organization

Computer Architecture and Organization A-1 Appendix A - Digital Logic Computer Architecture and Organization Miles Murdocca and Vincent Heuring Appendix A Digital Logic A-2 Appendix A - Digital Logic Chapter Contents A.1 Introduction A.2 Combinational

More information

MIC Series IP Power Supply

MIC Series IP Power Supply Video MIC Series IP Power Spply MIC Series IP Power Spply www.boschsecrity.com MIC power spply with IVA-enabled, integrated Bosch IP technology provides video and control over IP for MIC550, MIC550IR,

More information

Design of Memory Based Implementation Using LUT Multiplier

Design of Memory Based Implementation Using LUT Multiplier Design of Memory Based Implementation Using LUT Multiplier Charan Kumar.k 1, S. Vikrama Narasimha Reddy 2, Neelima Koppala 3 1,2 M.Tech(VLSI) Student, 3 Assistant Professor, ECE Department, Sree Vidyanikethan

More information

Midterm Exam 15 points total. March 28, 2011

Midterm Exam 15 points total. March 28, 2011 Midterm Exam 15 points total March 28, 2011 Part I Analytical Problems 1. (1.5 points) A. Convert to decimal, compare, and arrange in ascending order the following numbers encoded using various binary

More information

Modified Reconfigurable Fir Filter Design Using Look up Table

Modified Reconfigurable Fir Filter Design Using Look up Table Modified Reconfigurable Fir Filter Design Using Look up Table R. Dhayabarani, Assistant Professor. M. Poovitha, PG scholar, V.S.B Engineering College, Karur, Tamil Nadu. Abstract - Memory based structures

More information

An optimized implementation of 128 bit carry select adder using binary to excess-one converter for delay reduction and area efficiency

An optimized implementation of 128 bit carry select adder using binary to excess-one converter for delay reduction and area efficiency Journal From the SelectedWorks of Journal December, 2014 An optimized implementation of 128 bit carry select adder using binary to excess-one converter for delay reduction and area efficiency P. Manga

More information

Testability: Lecture 23 Design for Testability (DFT) Slide 1 of 43

Testability: Lecture 23 Design for Testability (DFT) Slide 1 of 43 Testability: Lecture 23 Design for Testability (DFT) Shaahin hi Hessabi Department of Computer Engineering Sharif University of Technology Adapted, with modifications, from lecture notes prepared p by

More information

CHAPTER 4 RESULTS & DISCUSSION

CHAPTER 4 RESULTS & DISCUSSION CHAPTER 4 RESULTS & DISCUSSION 3.2 Introduction This project aims to prove that Modified Baugh-Wooley Two s Complement Signed Multiplier is one of the high speed multipliers. The schematic of the multiplier

More information

Chapter 4. Logic Design

Chapter 4. Logic Design Chapter 4 Logic Design 4.1 Introduction. In previous Chapter we studied gates and combinational circuits, which made by gates (AND, OR, NOT etc.). That can be represented by circuit diagram, truth table

More information

Efficient Method for Look-Up-Table Design in Memory Based Fir Filters

Efficient Method for Look-Up-Table Design in Memory Based Fir Filters International Journal of Computer Applications (975 8887) Volume 78 No.6, September Efficient Method for Look-Up-Table Design in Memory Based Fir Filters Md.Zameeruddin M.Tech, DECS, Dept. of ECE, Vardhaman

More information

UNIT 1: DIGITAL LOGICAL CIRCUITS What is Digital Computer? OR Explain the block diagram of digital computers.

UNIT 1: DIGITAL LOGICAL CIRCUITS What is Digital Computer? OR Explain the block diagram of digital computers. UNIT 1: DIGITAL LOGICAL CIRCUITS What is Digital Computer? OR Explain the block diagram of digital computers. Digital computer is a digital system that performs various computational tasks. The word DIGITAL

More information

Sharif University of Technology. SoC: Introduction

Sharif University of Technology. SoC: Introduction SoC Design Lecture 1: Introduction Shaahin Hessabi Department of Computer Engineering System-on-Chip System: a set of related parts that act as a whole to achieve a given goal. A system is a set of interacting

More information

An Efficient 64-Bit Carry Select Adder With Less Delay And Reduced Area Application

An Efficient 64-Bit Carry Select Adder With Less Delay And Reduced Area Application An Efficient 64-Bit Carry Select Adder With Less Delay And Reduced Area Application K Allipeera, M.Tech Student & S Ahmed Basha, Assitant Professor Department of Electronics & Communication Engineering

More information

International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013

International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013 International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013 Design and Implementation of an Enhanced LUT System in Security Based Computation dama.dhanalakshmi 1, K.Annapurna

More information

A Low Power Delay Buffer Using Gated Driver Tree

A Low Power Delay Buffer Using Gated Driver Tree IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) ISSN: 2319 4200, ISBN No. : 2319 4197 Volume 1, Issue 4 (Nov. - Dec. 2012), PP 26-30 A Low Power Delay Buffer Using Gated Driver Tree Kokkilagadda

More information

Lecture 23 Design for Testability (DFT): Full-Scan

Lecture 23 Design for Testability (DFT): Full-Scan Lecture 23 Design for Testability (DFT): Full-Scan (Lecture 19alt in the Alternative Sequence) Definition Ad-hoc methods Scan design Design rules Scan register Scan flip-flops Scan test sequences Overheads

More information

EXPERIMENT: 1. Graphic Symbol: OR: The output of OR gate is true when one of the inputs A and B or both the inputs are true.

EXPERIMENT: 1. Graphic Symbol: OR: The output of OR gate is true when one of the inputs A and B or both the inputs are true. EXPERIMENT: 1 DATE: VERIFICATION OF BASIC LOGIC GATES AIM: To verify the truth tables of Basic Logic Gates NOT, OR, AND, NAND, NOR, Ex-OR and Ex-NOR. APPARATUS: mention the required IC numbers, Connecting

More information

Design of BIST with Low Power Test Pattern Generator

Design of BIST with Low Power Test Pattern Generator IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 4, Issue 5, Ver. II (Sep-Oct. 2014), PP 30-39 e-issn: 2319 4200, p-issn No. : 2319 4197 Design of BIST with Low Power Test Pattern Generator

More information

CHAPTER 4: Logic Circuits

CHAPTER 4: Logic Circuits CHAPTER 4: Logic Circuits II. Sequential Circuits Combinational circuits o The outputs depend only on the current input values o It uses only logic gates, decoders, multiplexers, ALUs Sequential circuits

More information

Scan. This is a sample of the first 15 pages of the Scan chapter.

Scan. This is a sample of the first 15 pages of the Scan chapter. Scan This is a sample of the first 15 pages of the Scan chapter. Note: The book is NOT Pinted in color. Objectives: This section provides: An overview of Scan An introduction to Test Sequences and Test

More information

Introduction Actel Logic Modules Xilinx LCA Altera FLEX, Altera MAX Power Dissipation

Introduction Actel Logic Modules Xilinx LCA Altera FLEX, Altera MAX Power Dissipation Outline CPE 528: Session #12 Department of Electrical and Computer Engineering University of Alabama in Huntsville Introduction Actel Logic Modules Xilinx LCA Altera FLEX, Altera MAX Power Dissipation

More information

EECS 140 Laboratory Exercise 7 PLD Programming

EECS 140 Laboratory Exercise 7 PLD Programming 1. Objectives EECS 140 Laboratory Exercise 7 PLD Programming A. Become familiar with the capabilities of Programmable Logic Devices (PLDs) B. Implement a simple combinational logic circuit using a PLD.

More information

Ultra-lightweight 8-bit Multiplicative Inverse Based S-box Using LFSR

Ultra-lightweight 8-bit Multiplicative Inverse Based S-box Using LFSR Ultra-lightweight -bit Multiplicative Inverse Based S-box Using LFSR Sourav Das Alcatel-Lucent India Ltd Email:sourav10101976@gmail.com Abstract. Most of the lightweight block ciphers are nibble-oriented

More information

N.S.N College of Engineering and Technology, Karur

N.S.N College of Engineering and Technology, Karur Modified Reconfigurable CSD Fir Filter Design Using Look up Table Sivakumar.M 1, Ranjitha.S 2, Vijayabharathi.P 3, Dhivya.G 4 1 Assistant professor, 2,3,4 UG student-final year, Department of Electronics

More information

Optimization of FPGA Architecture for Uniform Random Number Generator Using LUT-SR Family

Optimization of FPGA Architecture for Uniform Random Number Generator Using LUT-SR Family Optimization of FPGA Architecture for Uniform Random Number Generator Using LUT-SR Family Rita Rawate 1, M. V. Vyawahare 2 1 Nagpur University, Priyadarshini College of Engineering, Nagpur 2 Professor,

More information

Hardware Modeling of Binary Coded Decimal Adder in Field Programmable Gate Array

Hardware Modeling of Binary Coded Decimal Adder in Field Programmable Gate Array American Journal of Applied Sciences 10 (5): 466-477, 2013 ISSN: 1546-9239 2013 M.I. Ibrahimy et al., This open access article is distributed under a Creative Commons Attribution (CC-BY) 3.0 license doi:10.3844/ajassp.2013.466.477

More information

The Design of Efficient Viterbi Decoder and Realization by FPGA

The Design of Efficient Viterbi Decoder and Realization by FPGA Modern Applied Science; Vol. 6, No. 11; 212 ISSN 1913-1844 E-ISSN 1913-1852 Published by Canadian Center of Science and Education The Design of Efficient Viterbi Decoder and Realization by FPGA Liu Yanyan

More information

A Pseudorandom Binary Generator Based on Chaotic Linear Feedback Shift Register

A Pseudorandom Binary Generator Based on Chaotic Linear Feedback Shift Register A Pseudorandom Binary Generator Based on Chaotic Linear Feedback Shift Register Saad Muhi Falih Department of Computer Technical Engineering Islamic University College Al Najaf al Ashraf, Iraq saadmuheyfalh@gmail.com

More information

Hardware Implementation of Viterbi Decoder for Wireless Applications

Hardware Implementation of Viterbi Decoder for Wireless Applications Hardware Implementation of Viterbi Decoder for Wireless Applications Bhupendra Singh 1, Sanjeev Agarwal 2 and Tarun Varma 3 Deptt. of Electronics and Communication Engineering, 1 Amity School of Engineering

More information

In 2007, Pew Research conducted a survey to assess Americans knowledge of

In 2007, Pew Research conducted a survey to assess Americans knowledge of CHAPTER 12 Sample Srveys In 2007, Pew Research condcted a srvey to assess Americans knowledge of crrent events. They asked a random sample of 1,502 U.S. adlts 23 factal qestions abot topics crrently in

More information

Lossless Compression Algorithms for Direct- Write Lithography Systems

Lossless Compression Algorithms for Direct- Write Lithography Systems Lossless Compression Algorithms for Direct- Write Lithography Systems Hsin-I Liu Video and Image Processing Lab Department of Electrical Engineering and Computer Science University of California at Berkeley

More information

Implementation of CRC and Viterbi algorithm on FPGA

Implementation of CRC and Viterbi algorithm on FPGA Implementation of CRC and Viterbi algorithm on FPGA S. V. Viraktamath 1, Akshata Kotihal 2, Girish V. Attimarad 3 1 Faculty, 2 Student, Dept of ECE, SDMCET, Dharwad, 3 HOD Department of E&CE, Dayanand

More information

2.6 Reset Design Strategy

2.6 Reset Design Strategy 2.6 Reset esign Strategy Many design issues must be considered before choosing a reset strategy for an ASIC design, such as whether to use synchronous or asynchronous resets, will every flipflop receive

More information

Design and FPGA Implementation of 100Gbit/s Scrambler Architectures for OTN Protocol Chethan Kumar M 1, Praveen Kumar Y G 2, Dr. M. Z. Kurian 3.

Design and FPGA Implementation of 100Gbit/s Scrambler Architectures for OTN Protocol Chethan Kumar M 1, Praveen Kumar Y G 2, Dr. M. Z. Kurian 3. International Journal of Computer Engineering and Applications, Volume VI, Issue II, May 14 www.ijcea.com ISSN 2321 3469 Design and FPGA Implementation of 100Gbit/s Scrambler Architectures for OTN Protocol

More information

A VLSI Architecture for Variable Block Size Video Motion Estimation

A VLSI Architecture for Variable Block Size Video Motion Estimation A VLSI Architecture for Variable Block Size Video Motion Estimation Yap, S. Y., & McCanny, J. (2004). A VLSI Architecture for Variable Block Size Video Motion Estimation. IEEE Transactions on Circuits

More information

Combinational / Sequential Logic

Combinational / Sequential Logic Digital Circuit Design and Language Combinational / Sequential Logic Chang, Ik Joon Kyunghee University Combinational Logic + The outputs are determined by the present inputs + Consist of input/output

More information

A Symmetric Differential Clock Generator for Bit-Serial Hardware

A Symmetric Differential Clock Generator for Bit-Serial Hardware A Symmetric Differential Clock Generator for Bit-Serial Hardware Mitchell J. Myjak and José G. Delgado-Frias School of Electrical Engineering and Computer Science Washington State University Pullman, WA,

More information

K.T. Tim Cheng 07_dft, v Testability

K.T. Tim Cheng 07_dft, v Testability K.T. Tim Cheng 07_dft, v1.0 1 Testability Is concept that deals with costs associated with testing. Increase testability of a circuit Some test cost is being reduced Test application time Test generation

More information

Chapter 5: Synchronous Sequential Logic

Chapter 5: Synchronous Sequential Logic Chapter 5: Synchronous Sequential Logic NCNU_2016_DD_5_1 Digital systems may contain memory for storing information. Combinational circuits contains no memory elements the outputs depends only on the inputs

More information

Digilent Nexys-3 Cellular RAM Controller Reference Design Overview

Digilent Nexys-3 Cellular RAM Controller Reference Design Overview Digilent Nexys-3 Cellular RAM Controller Reference Design Overview General Overview This document describes a reference design of the Cellular RAM (or PSRAM Pseudo Static RAM) controller for the Digilent

More information

DIVAR network 2000 recorder

DIVAR network 2000 recorder Video DIVAR network 2000 recorder DIVAR network 2000 recorder www.boschsecrity.com APP H.265 16 IP channels with 256 Mbps incoming bandwidth 8 MP (UHD) IP camera spport for view and playback Real time

More information

MUHAMMAD NAEEM LATIF MCS 3 RD SEMESTER KHANEWAL

MUHAMMAD NAEEM LATIF MCS 3 RD SEMESTER KHANEWAL 1. A stage in a shift register consists of (a) a latch (b) a flip-flop (c) a byte of storage (d) from bits of storage 2. To serially shift a byte of data into a shift register, there must be (a) one click

More information

140 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 12, NO. 2, FEBRUARY 2004

140 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 12, NO. 2, FEBRUARY 2004 140 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 12, NO. 2, FEBRUARY 2004 Leakage Current Reduction in CMOS VLSI Circuits by Input Vector Control Afshin Abdollahi, Farzan Fallah,

More information

Efficient Architecture for Flexible Prescaler Using Multimodulo Prescaler

Efficient Architecture for Flexible Prescaler Using Multimodulo Prescaler Efficient Architecture for Flexible Using Multimodulo G SWETHA, S YUVARAJ Abstract This paper, An Efficient Architecture for Flexible Using Multimodulo is an architecture which is designed from the proposed

More information

CESR BPM System Calibration

CESR BPM System Calibration CESR BPM System Calibration Joseph Burrell Mechanical Engineering, WSU, Detroit, MI, 48202 (Dated: August 11, 2006) The Cornell Electron Storage Ring(CESR) uses beam position monitors (BPM) to determine

More information

Layout Decompression Chip for Maskless Lithography

Layout Decompression Chip for Maskless Lithography Layout Decompression Chip for Maskless Lithography Borivoje Nikolić, Ben Wild, Vito Dai, Yashesh Shroff, Benjamin Warlick, Avideh Zakhor, William G. Oldham Department of Electrical Engineering and Computer

More information

DIGITAL FUNDAMENTALS

DIGITAL FUNDAMENTALS DIGITAL FUNDAMENTALS A SYSTEMS APPROACH THOMAS L. FLOYD PEARSON Boston Columbus Indianapolis New York San Francisco Upper Saddle River Amsterdam Cape Town Dubai London Madrid Milan Munich Paris Montreal

More information

DIGITAL SYSTEM DESIGN UNIT I (2 MARKS)

DIGITAL SYSTEM DESIGN UNIT I (2 MARKS) DIGITAL SYSTEM DESIGN UNIT I (2 MARKS) 1. Convert Binary number (111101100) 2 to Octal equivalent. 2. Convert Binary (1101100010011011) 2 to Hexadecimal equivalent. 3. Simplify the following Boolean function

More information

Hardware Implementation of Block GC3 Lossless Compression Algorithm for Direct-Write Lithography Systems

Hardware Implementation of Block GC3 Lossless Compression Algorithm for Direct-Write Lithography Systems Hardware Implementation of Block GC3 Lossless Compression Algorithm for Direct-Write Lithography Systems Hsin-I Liu, Brian Richards, Avideh Zakhor, and Borivoje Nikolic Dept. of Electrical Engineering

More information

Design and Implementation of SOC VGA Controller Using Spartan-3E FPGA

Design and Implementation of SOC VGA Controller Using Spartan-3E FPGA Design and Implementation of SOC VGA Controller Using Spartan-3E FPGA 1 ARJUNA RAO UDATHA, 2 B.SUDHAKARA RAO, 3 SUDHAKAR.B. 1 Dept of ECE, PG Scholar, 2 Dept of ECE, Associate Professor, 3 Electronics,

More information

BITSTREAM COMPRESSION TECHNIQUES FOR VIRTEX 4 FPGAS

BITSTREAM COMPRESSION TECHNIQUES FOR VIRTEX 4 FPGAS BITSTREAM COMPRESSION TECHNIQUES FOR VIRTEX 4 FPGAS Radu Ştefan, Sorin D. Coţofană Computer Engineering Laboratory, Delft University of Technology Mekelweg 4, 2628 CD Delft, The Netherlands email: R.A.Stefan@tudelft.nl,

More information

A Reed Solomon Product-Code (RS-PC) Decoder Chip for DVD Applications

A Reed Solomon Product-Code (RS-PC) Decoder Chip for DVD Applications IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 2, FEBRUARY 2001 229 A Reed Solomon Product-Code (RS-PC) Decoder Chip DVD Applications Hsie-Chia Chang, C. Bernard Shung, Member, IEEE, and Chen-Yi Lee

More information

Laboratory Exercise 7

Laboratory Exercise 7 Laboratory Exercise 7 Finite State Machines This is an exercise in using finite state machines. Part I We wish to implement a finite state machine (FSM) that recognizes two specific sequences of applied

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Sciences

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Sciences MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Sciences Introductory Digital Systems Lab (6.111) Quiz #2 - Spring 2003 Prof. Anantha Chandrakasan and Prof. Don

More information

EXPERIMENT 13 ITERATIVE CIRCUITS

EXPERIMENT 13 ITERATIVE CIRCUITS EE 2449 Experiment 13 Revised 4/17/2017 CALIFORNIA STATE UNIVERSITY LOS ANGELES Department of Electrical and Computer Engineering EE-246 Digital Logic Lab EXPERIMENT 13 ITERATIVE CIRCUITS Text: Mano, Digital

More information

The basic logic gates are the inverter (or NOT gate), the AND gate, the OR gate and the exclusive-or gate (XOR). If you put an inverter in front of

The basic logic gates are the inverter (or NOT gate), the AND gate, the OR gate and the exclusive-or gate (XOR). If you put an inverter in front of 1 The basic logic gates are the inverter (or NOT gate), the AND gate, the OR gate and the exclusive-or gate (XOR). If you put an inverter in front of the AND gate, you get the NAND gate etc. 2 One of the

More information

1. Convert the decimal number to binary, octal, and hexadecimal.

1. Convert the decimal number to binary, octal, and hexadecimal. 1. Convert the decimal number 435.64 to binary, octal, and hexadecimal. 2. Part A. Convert the circuit below into NAND gates. Insert or remove inverters as necessary. Part B. What is the propagation delay

More information

Figure 1: segment of an unprogrammed and programmed PAL.

Figure 1: segment of an unprogrammed and programmed PAL. PROGRAMMABLE ARRAY LOGIC The PAL device is a special case of PLA which has a programmable AND array and a fixed OR array. The basic structure of Rom is same as PLA. It is cheap compared to PLA as only

More information

Synchronous Sequential Design

Synchronous Sequential Design Synchronous Sequential Design SMD098 Computation Structures Lecture 4 1 Synchronous sequential systems Almost all digital systems have some concept of state the outputs of a system depends on the past

More information

Design Project: Designing a Viterbi Decoder (PART I)

Design Project: Designing a Viterbi Decoder (PART I) Digital Integrated Circuits A Design Perspective 2/e Jan M. Rabaey, Anantha Chandrakasan, Borivoje Nikolić Chapters 6 and 11 Design Project: Designing a Viterbi Decoder (PART I) 1. Designing a Viterbi

More information

Report on 4-bit Counter design Report- 1, 2. Report on D- Flipflop. Course project for ECE533

Report on 4-bit Counter design Report- 1, 2. Report on D- Flipflop. Course project for ECE533 Report on 4-bit Counter design Report- 1, 2. Report on D- Flipflop Course project for ECE533 I. Objective: REPORT-I The objective of this project is to design a 4-bit counter and implement it into a chip

More information

Dual Edge Adaptive Pulse Triggered Flip-Flop for a High Speed and Low Power Applications

Dual Edge Adaptive Pulse Triggered Flip-Flop for a High Speed and Low Power Applications International Journal of Scientific and Research Publications, Volume 5, Issue 10, October 2015 1 Dual Edge Adaptive Pulse Triggered Flip-Flop for a High Speed and Low Power Applications S. Harish*, Dr.

More information

Altera s Max+plus II Tutorial

Altera s Max+plus II Tutorial Altera s Max+plus II Tutorial Written by Kris Schindler To accompany Digital Principles and Design (by Donald D. Givone) 8/30/02 1 About Max+plus II Altera s Max+plus II is a powerful simulation package

More information

Chapter 8 Functions of Combinational Logic

Chapter 8 Functions of Combinational Logic ETEC 23 Programmable Logic Devices Chapter 8 Functions of Combinational Logic Shawnee State University Department of Industrial and Engineering Technologies Copyright 27 by Janna B. Gallaher Basic Adders

More information

Design and Implementation of Signal Processing Systems: An Introduction

Design and Implementation of Signal Processing Systems: An Introduction Design and Implementation of Signal Processing Systems: An Introduction Outline Course Objectives and Outline, Conduct What is signal processing? Implementation Options and Design issues: General purpose

More information

A New Proposed Design of a Stream Cipher Algorithm: Modified Grain - 128

A New Proposed Design of a Stream Cipher Algorithm: Modified Grain - 128 International Journal of Computer and Information Technology (ISSN: 2279 764) Volume 3 Issue 5, September 214 A New Proposed Design of a Stream Cipher Algorithm: Modified Grain - 128 Norul Hidayah Lot

More information

Programmable Logic Design I

Programmable Logic Design I Programmable Logic Design I Introduction In labs 11 and 12 you built simple logic circuits on breadboards using TTL logic circuits on 7400 series chips. This process is simple and easy for small circuits.

More information

Research Article Ring Counter Based ATPG for Low Transition Test Pattern Generation

Research Article Ring Counter Based ATPG for Low Transition Test Pattern Generation e Scientific World Journal Volume 205, Article ID 72965, 6 pages http://dx.doi.org/0.55/205/72965 Research Article Ring Counter Based ATPG for Low Transition Test Pattern Generation V. M. Thoulath Begam

More information

Upgrading a FIR Compiler v3.1.x Design to v3.2.x

Upgrading a FIR Compiler v3.1.x Design to v3.2.x Upgrading a FIR Compiler v3.1.x Design to v3.2.x May 2005, ver. 1.0 Application Note 387 Introduction This application note is intended for designers who have an FPGA design that uses the Altera FIR Compiler

More information

Traffic Light Controller

Traffic Light Controller Traffic Light Controller Four Way Intersection Traffic Light System Fall-2017 James Todd, Thierno Barry, Andrew Tamer, Gurashish Grewal Electrical and Computer Engineering Department School of Engineering

More information

Decade Counters Mod-5 counter: Decade Counter:

Decade Counters Mod-5 counter: Decade Counter: Decade Counters We can design a decade counter using cascade of mod-5 and mod-2 counters. Mod-2 counter is just a single flip-flop with the two stable states as 0 and 1. Mod-5 counter: A typical mod-5

More information

M-Vision Laser 18K Series High Brightness Digital Video Projector

M-Vision Laser 18K Series High Brightness Digital Video Projector M-Vision Laser 18K Series High Brightness Digital Video Projector 4INSTALLATION AND QUICK-START GUIDE 4CONNECTION GUIDE 4OPERATING GUIDE 4REFERENCE GUIDE 118-056C Abot This Docment Follow the instrctions

More information

ECE532 Digital System Design Title: Stereoscopic Depth Detection Using Two Cameras. Final Design Report

ECE532 Digital System Design Title: Stereoscopic Depth Detection Using Two Cameras. Final Design Report ECE532 Digital System Design Title: Stereoscopic Depth Detection Using Two Cameras Group #4 Prof: Chow, Paul Student 1: Robert An Student 2: Kai Chun Chou Student 3: Mark Sikora April 10 th, 2015 Final

More information

The Stratix II Logic and Routing Architecture

The Stratix II Logic and Routing Architecture The Stratix II Logic and Routing Architecture David Lewis*, Elias Ahmed*, Gregg Baeckler, Vaughn Betz*, Mark Bourgeault*, David Cashman*, David Galloway*, Mike Hutton, Chris Lane, Andy Lee, Paul Leventis*,

More information

From Theory to Practice: Private Circuit and Its Ambush

From Theory to Practice: Private Circuit and Its Ambush Indian Institute of Technology Kharagpur Telecom ParisTech From Theory to Practice: Private Circuit and Its Ambush Debapriya Basu Roy, Shivam Bhasin, Sylvain Guilley, Jean-Luc Danger and Debdeep Mukhopadhyay

More information

Digital Systems Laboratory 1 IE5 / WS 2001

Digital Systems Laboratory 1 IE5 / WS 2001 Digital Systems Laboratory 1 IE5 / WS 2001 university of applied sciences fachhochschule hamburg FACHBEREICH ELEKTROTECHNIK UND INFORMATIK digital and microprocessor systems laboratory In this course you

More information

A Parallel Area Delay Efficient Interpolation Filter Architecture

A Parallel Area Delay Efficient Interpolation Filter Architecture A Parallel Area Delay Efficient Interpolation Filter Architecture [1] Anusha Ajayan, [2] Rafeekha M J [1] PG Student [VLSI & ES] [2] Assistant professor, Department of ECE, TKM Institute of Technology,

More information

A High-Resolution Flash Time-to-Digital Converter Taking Into Account Process Variability. Nikolaos Minas David Kinniment Keith Heron Gordon Russell

A High-Resolution Flash Time-to-Digital Converter Taking Into Account Process Variability. Nikolaos Minas David Kinniment Keith Heron Gordon Russell A High-Resolution Flash Time-to-Digital Converter Taking Into Account Process Variability Nikolaos Minas David Kinniment Keith Heron Gordon Russell Outline of Presentation Introduction Background in Time-to-Digital

More information

Integrated Circuit Design ELCT 701 (Winter 2017) Lecture 1: Introduction

Integrated Circuit Design ELCT 701 (Winter 2017) Lecture 1: Introduction 1 Integrated Circuit Design ELCT 701 (Winter 2017) Lecture 1: Introduction Assistant Professor Office: C3.315 E-mail: eman.azab@guc.edu.eg 2 Course Overview Lecturer Teaching Assistant Course Team E-mail:

More information

Design of Test Circuits for Maximum Fault Coverage by Using Different Techniques

Design of Test Circuits for Maximum Fault Coverage by Using Different Techniques Design of Test Circuits for Maximum Fault Coverage by Using Different Techniques Akkala Suvarna Ratna M.Tech (VLSI & ES), Department of ECE, Sri Vani School of Engineering, Vijayawada. Abstract: A new

More information

Registers. Unit 12 Registers and Counters. Registers (D Flip-Flop based) Register Transfers (example not out of text) Accumulator Registers

Registers. Unit 12 Registers and Counters. Registers (D Flip-Flop based) Register Transfers (example not out of text) Accumulator Registers Unit 2 Registers and Counters Fundamentals of Logic esign EE2369 Prof. Eric Maconald Fall Semester 23 Registers Groups of flip-flops Can contain data format can be unsigned, 2 s complement and other more

More information

BIST-Based Diagnostics of FPGA Logic Blocks

BIST-Based Diagnostics of FPGA Logic Blocks To appear in Proc. International Test Conf., Nov. 1997 BIST-Based Diagnostics of FPGA Logic Blocks Charles Stroud, Eric Lee, Dept. of Electrical Engineering University of Kentucky and Miron Abramovici

More information

Tutorial 11 ChipscopePro, ISE 10.1 and Xilinx Simulator on the Digilent Spartan-3E board

Tutorial 11 ChipscopePro, ISE 10.1 and Xilinx Simulator on the Digilent Spartan-3E board Tutorial 11 ChipscopePro, ISE 10.1 and Xilinx Simulator on the Digilent Spartan-3E board Introduction This lab will be an introduction on how to use ChipScope for the verification of the designs done on

More information

Low Power VLSI CMOS Design An Image Processing Chip for RGB to HSI Conversion

Low Power VLSI CMOS Design An Image Processing Chip for RGB to HSI Conversion Low Power VLSI CMOS Design An Image Processing Chip for RGB to HSI Conversion A.Th. Schwarzbacher 1,2 and J.B. Foley 2 1 Dublin Institute of Technology, Dept. Of Electronic and Communication Eng., Dublin,

More information

Block Diagram. 16/24/32 etc. pixin pixin_sof pixin_val. Supports 300 MHz+ operation on basic FPGA devices 2 Memory Read/Write Arbiter SYSTEM SIGNALS

Block Diagram. 16/24/32 etc. pixin pixin_sof pixin_val. Supports 300 MHz+ operation on basic FPGA devices 2 Memory Read/Write Arbiter SYSTEM SIGNALS Key Design Features Block Diagram Synthesizable, technology independent IP Core for FPGA, ASIC or SoC Supplied as human readable VHDL (or Verilog) source code Output supports full flow control permitting

More information

Cascadable 4-Bit Comparator

Cascadable 4-Bit Comparator EE 415 Project Report for Cascadable 4-Bit Comparator By William Dixon Mailbox 509 June 1, 2010 INTRODUCTION... 3 THE CASCADABLE 4-BIT COMPARATOR... 4 CONCEPT OF OPERATION... 4 LIMITATIONS... 5 POSSIBILITIES

More information