Montgomery Modular Exponentiation on Reconfigurable Hardware æ

Size: px
Start display at page:

Download "Montgomery Modular Exponentiation on Reconfigurable Hardware æ"

Transcription

1 Montgomery Modlar Exponentiation on Reconfigrable Hardware æ Thomas Blm Worcester Polytechnic Institte ECE Department Worcester, MA , USA Christof Paar Abstract It is widely recognized that secrity isses will play a crcial role in the majority of ftre compter and commnication systems. Central tools for achieving system secrity are cryptographic algorithms. For performance as well as for physical secrity reasons, it is often advantageos to realize cryptographic algorithms in hardware. In order to overcome the well-known drawback of redced flexibility that is associated with traditional ASIC soltions, this contribtion proposes arithmetic architectres which are optimized for modern field programmable gate arrays (FPGAs). The proposed architectres perform modlar exponentiation with very long integers. This operation is at the heart of many practical pblic-key algorithms sch as RSA and discrete logarithm schemes. We combine the Montgomery modlar mltiplication algorithm with a new systolic array design, which is capable of processing a variable nmber of bits per array cell. The designs are flexible, allowing any choice of operand and modls. Unlike previos approaches, we systematically implement and compare several variants of or new architectre for different bit lengths. We provide absolte area and timing measres for each architectre. The reslts allow conclsions abot the feasibility and time-space trade-offs of or architectre for implementation on Xilinx XC4000 series FPGAs. As a major practical reslt we show that it is possible to implement modlar exponentiation at secre bit lengths on a single commercially available FPGA. Introdction It is widely recognized that secrity isses will play a crcial role in many ftre compter and commnication systems. A central tool for achieving system secrity are æ The research was spported in part throgh an NSF CAREER award #CCR cryptographic algorithms. For performance as well as for physical secrity reasons it is often reqired to realize cryptographic algorithms in hardware. Traditional ASIC soltions, however, have the well-known drawback of redced flexibility compared to software soltions. Since modern secrity protocols are increasingly defined to be algorithm independent, a high degree of flexibility with respect to the cryptographic algorithms is desirable. A promising soltion which combines high flexibility with the speed and physical secrity of traditional hardware is the implementation of cryptographic algorithms on reconfigrable devices sch as FPGAs and EPLDs. In the case of pblic-key schemes, algorithm independence can mean not only a change of the actal crypto algorithm bt also change of parameters sch as bit length, modls, or exponents. This contribtion deals with arithmetic architectres for modlar exponentiation with very long integers which is at the heart of most modern pblic-key schemes. Most notably, both RSA and discrete logarithm-based (e.g., Diffie-Hellman key exchange or the Digital Signatre Algorithm, DSA) schemes reqire modlar long nmber exponentiation. The challenge at hand is to design sch arithmetic architectres for operands with p to 024 bit on crrent FPGAs. The very long word lengths prohibit the application of many proposed architectres as they wold reslt in nrealistically large resorce reqirements. In this contribtion we derive a modlar exponentiation architectre which combines Montgomery s modlar redction scheme and a novel systolic array architectre. The systolic array architectre reqires considerably fewer logic resorces than many other systolic array architectres for modlar arithmetic. This is crcial, as one of or goals was to derive soltions that can fit into a single FPGA. Clearly a design which fits in a single FPGA has many cost and design advantages over mlti FPGA soltions. Another important objective was to systematically implement varios architectre options for different bit lengths. This contribtion is strctred as follows. In Section 2, we smmarize some of the previos work on modlar ex-

2 ponentiation. Section 3 describes algorithms for modlar exponentiation and mltiplication and some simplifications and speed-ps for their hardware implementation. In this section we also describe some of the relevant featres of the Xilinx XC4000 FPGA series. Section 4 otlines or architectre for modlar exponentiation. Section 5 briefly describes or methodology and tools that were sed for this research. Section 6 of this contribtion posts the timing and area reslts obtained. A comparison to other architectres and an otlook conclde this contribtion. 2 Previos Work In the following, we will smmarize relevant previos work in the field of modlar mltiplication. Most proposed approaches are based on Montgomery s algorithm [0], either in conjnction with a redndant nmber representation or in an systolic array architectre. Soltions sing other algorithms have also been presented. To avoid the carry propagation in mltiplication/addition architectres several soltions have been proposed in the literatre. They either se Montgomery s algorithm, in combination with a redndant radix nmber system [5, 3, 7, 4, 8, 6] or the Reside Nmber System []. The Research Laboratory of Digital Eqipment Corp. in Paris implemented modlar exponentiation in architectres on FPGAs [7, 3]. They tilized an array of 6 XILINX 3090 FPGAs. Compared to XILINX 4000 series in terms of flip flops, this is eqivalent to a chip with 500 configrable logic blocks (CLBs). In terms of logic resorces this is eqivalent to a chip of 4000 CLBs. In their work they sed several speed-p methods [3] inclding the Chinese remainder theorem, an asynchronos carry completion adder, and a windowing method. The implementation comptes a 970bit RSA decryption at a rate of 85kb/s (5.2ms per 970 bit decryption) and a 52 bit RSA decryption in excess of 300 kb/s (.7ms per 52 bit decryption). A drawback of this soltion is that the binary representation of the modls is hardwired into the logic representation so that the architectre has to be reconfigred with every new modls. There has been a nmber of proposals for systolic array architectres for modlar arithmetic. However, no implementations have been reported to or knowledge. In [5] a VLSI soltion is presented where a modlar mltiplication is calclated in è4n +èæ 3n=2 clock cycles (n is the nmber of bits of the modls). That is approximately for times more cycles than in a conventional soltion. In terms of resorces this design wold be sitable for FPGA. Similar two-dimensional systolic arrays are presented in [7, 9, 20, 6]. For a radix of two they all propose an n æ n matrix of one bit processing elements. With this configration 2n modlar mltiplications are calclated at the same time and the theoretical throghpt is one modlar mltiplication per clock cycle. In terms of resorces, sch a soltion is not feasible in either VLSI or FPGA for the bit length reqired in pblic-key algorithms. Even implementing only onerowofprocessingelements, (resltinginn times slower throghpt) into presently available FPGAs is difficlt in terms of resorces. We tried to overcome the shortage of resorces per chip by sing larger processing elements and ths saving overhead. Reference [2] provides a good overview of previosly presented architectres for VLSI implementations of modlar integer arithmetic. Reference [3] smmarizes the chips available in 990 for performing RSA encryption. More recently an approach [23] has been presented that tilizes precompted complements of the modls and is based on the iterative Horner s rle. Compared to Montgomery s algorithms these approaches se the most significant bits of an intermediate reslt to decide which mltiples of the modls to sbtract. The drawback of these soltions is that they either need a large amont of storage space or many clock cycles to complete a modlar mltiplication. The athors attempted to overcome the later problem by a higher clock freqency which is possible de to a simplified modlo redction operation. 3 Preliminaries 3. Modlar Exponentiation and RSA We start this section with a short description of the RSA algorithm, proposed by Rivest, Shamir and Adleman [2] in 978. The algorithm is based on modlar exponentiation of integers. The private key of a ser consists of two large primes p and q and an exponent D. The pblic key consists of the modls N = p æ q and an exponent E sch that E = D, mod èp, èèq, è. In the remainder of the article we always assme that N can be represented by n bits. To encrypt a message X the ser comptes: Y = X E mod N Decryption is done by calclating: X = Y D mod N The identical operations are tilized for the RSA digital signatre scheme. In order to thwart crrently known attacks, the modls N and ths X and Y shold have a length of bits. Both encryption and decryption reqire algorithms for compting a modlar exponentiation. This can be realized by sing the sqare and mltiply algorithm [4]. To compte sqaring and mltiplication in parallel we can se the following version [20]: 2

3 Algorithm : comptes P = X E mod N, where E = P n, i=0 e i2 i, e i 2f0; g. P 0 =, Z 0 = X 2. for i = 0 to n, do 3. Z i+ = Z 2 i mod N 4. if e i =then P i+ = P i æ Z i mod N Algorithm takes 2n operations in the worst case and :5n on average. For speeding p encryption the se of a short exponent E has been proposed [8]. Recommended by ITU is the the Fermat prime F 4 = Using F 4, the encryption is exected in only 7 operations. Other short exponents proposed inclde E =3and E =7. Obviosly the same trick can not be sed for decryption, as the decryption exponent D mst be kept secret. Bt sing the knowledge of the factors of N = q æ p, thechinese Remainder Theorem [] can be applied by the decrypting party. Two n=2 size modlar exponentiations and an additional recombination instead of one n size modlar exponentiations are compted in this case. Each modlar exponentiation of length n=2 takes =4 of the time reqired for an n bit exponentiation. If both exponentiations are performed serially, an over all speed p factor of two is achieved. If they are performed in parallel, a speed p factor of for is achieved. 3.2 Montgomery Modlar Mltiplication As shown in the previos section, modlar exponentiation is redced to a series of modlar mltiplications and sqarings. The algorithm for modlar mltiplication described below has been proposed by P. L. Montgomery in 985 [0]. Several optimizations were taken from reference [9]: Algorithm 2: Montgomery Modlar Mltiplication (radix 2) for compting A æ B mod N, where B = P P n+ b i=0 i2 i n+2, b i 2f0; g, b 0 = 0, A = a i=0 i2 i, a i 2 f0; g, a n+ =0, a n+2 =0. R 0 =0 2. for i =0to n +2do 3. q i = R i è0è 4. R i+ =èr i + a i æ B + q i æ N è=2 B is shifted p one bit with b 0 =0. This measre simplifies the comptation of q i, compared to the original algorithm. The loop of Algorithm 2 is exected three more times than originally proposed. With this step we make sre the ineqalities R i é 3N and R n+3 é 2N always hold. The reslt of a modlar mltiplication R n+3 can ths be resed as inpt A and B for the next mltiplication. We avoid the originally proposed final comparison and sbtraction and make a pipelined exection of the algorithm possible. A precondition for the algorithm to work is that the modls N has to be relatively prime to the radix. In RSA this is always satisfied as N is a mltiple of two primes and therefore odd. The algorithm above calclates R n = è2,n,3 ABè modn. To get the right reslt we need an extra Montgomery modlar mltiplication by 2 2n+6 mod N. However if frther mltiplications are reqired as for exponentiation it is better to pre mltiply all inpts by the factor 2 2n+6 mod N. Ths every intermediate reslt carries a factor 2 n+3. We jst need to Montgomery mltiply the reslt bytoeliminatethatfactor. The final Montgomery mltiplication with makes sre or final reslt is smaller than N. Consider Algorithm 2 with Bé 4N (B shifted p) and A =è0;:::;0; è. We will get R = B=2 é 2N. As all remaining a i =0,we getatmostr i+ =èr i + N è=2! N. If only one q i =0 èi =; 2 :::n+2è,thenr i+ = R i =2 én(probability:, 2,èn+2è ). The whole comptational complexity of Algorithm 2 lies in the three additions of n bit operands for compting R i+. As the propagation of n carries is too slow and an eqivalent carry look ahead logic reqires too many resorces, two different strategies have been prsed in the literatre:. Redndant representation: The intermediate reslts are kept in redndant form. Resoltion into binary representation is only done at the very end and for feeding the intermediate reslt back as a i in Algorithm Systolic Arrays: n processing nits calclate bit per clock cycle. The compted carries, q i and a i are pmped throgh the processing nits. As these signals have to be distribted only between adjacent processing nits, a faster clock speed and a reslting higher throghpt shold be possible. The cost is a higher latency and possibly more resorces. 3.3 Xilinx XC4000 Series FPGAs In this section we present some of the relevant featres of the Xilinx XC4000 Series FPGAs and introdce a metric for FPGA cost and performance evalation. An FPGA device consists of three types of reconfigrable elements, the Configrable Logic Blocks (CLBs), I/O blocks (IOBs) and roting resorces [22]. An XC4000 CLB is made p of 3 look p tables, two flip-flops and programmable mltiplexers. Any boolean fnction of 5 inpts, 3

4 any 2 fnctions of 4 inpts and some fnctions of p to 9 inpts can be compted in one CLB. The mltiplexers can rote these signals directly to the otpts or to the flip-flops. In the first case the flip-flops can be tilized to store direct inpts. Programmable roting resorces connect the CLBs and IOBs into a network. For signal distribtion all over the device there are 8 global nets available. Another featre of the CLB is its dedicated hardware to accelerate the carry path of adders and conters [22]. An n bit ripple carry adder is implemented in n=2 +2CLBs. As the carry signal ses dedicated interconnects, there is no roting delay in the path and the total delay is fixed: t pd =4:5+n æ 0:35 ënsë. On chip RAM redces the cost of data storage. A single CLB can be sed for a 6 æ 2 bit or 32 æ bit ROM/RAM or for a 6 æ bit Dal Port RAM. In previos work [20, 9, 4] the gate cont model has been sed for cost evalation and the gate delay model for speed evalation. This is not appropriate for FPGAs. As the fnctional nit of an FPGA is the CLB, we evalate the cost (C) in nmber of CLBs. The operation time (T) consists of logic delay in the CLBs and roting delay and is obtained from Xilinx s Timing Analyzer software. As a third parameter we se the time area prodct (TA). It is defined by time mltiplied by cost. 4 A New Architectre 4. Design Overview As described in Section 3.2, there have been two principle approaches proposed to compte Montgomery modlar mltiplication. A soltion following approach has already been implemented in FPGA [7]. The second approach sing systolic arrays has drawn considerable attention in the research commnity. However, no architectres that specifically target FPGAs have been reported, nor are there reports of implementations of sch systolic architectres. Or contribtion targets these two goals. Or system can be divided hierarchically into three levels.. Processing Element: Compte bits of a modlar mltiplication. 2. Modlar Mltiplication: An array of processing elements comptes a modlar mltiplication. 3. Modlar Exponentiation: Combine modlar mltiplications to modlar exponentiation according to Algorithm. In the following we describe the system with a bottom p approach. 4.2 Processing Elements A general radix 2 systolic array as proposed in [7, 9, 6, 5] tilizes n times n processing elements. As this approach wold reslt in nrealistically large CLB conts for the bit length reqired in modern pblic key schemes, we implemented only one row of processing elements. To frther redce the reqired nmber of CLBs we implemented processing elements (nits) of =4,8,6 bits. Withthisapproach we need onlyn= instead of n processing elements, and a considerable amont of overhead can be saved. Similar to the approach in [9] we compte sqarings and mltiplications of Algorithm in parallel. As explained in Section 4.3, this measre flly tilizes every cycle. Mx_B B_Reg "0" B+N_Reg N_Reg B_In N_In Res_0_In Mx_ -bit Adder + Add_Reg - Add_Reg_2 2 "0" Mx_2 Control - Figre. Processing Element (nit) Decode Mx_Res Control_Reg q_i, a_i-reg Reslt_Reg Control_Ot q_i, a_i-in 2 q_i, a_i-ot Carry_In Reslt_Ot Reslt_In Carry_Ot Res_0_Ot In the processing elements we need the following registers: æ N-Reg ( bits): storage of the modls æ B-Reg ( bits): storage of the B mltiplier æ B+N-Reg ( bits): storage of the intermediate reslt B + N æ Add-Reg ( +bits): storage of the intermediate reslt æ Add-Reg-2 (, bits): storage of the intermediate reslt æ Control-Reg (3 bits): control of the mltiplexers and clock enables æ a i,q i (2 bits): mltiplier A, qotient Q, according to Algorithm 2 æ Reslt-Reg ( bits): storage of the reslt at the end of a mltiplication 4

5 The registers need a total of è6 +5è=2 CLBs. Instead of compting èr + a i æ B + q i æ N è=2 in each iteration, we compte N + B once and store the reslt in the B+N-Reg. Mltiplexer Mx selects one of its inpts 0, N, B, B + N to be added to R according to the vale of the binary variables a i and q i. The additional cost is a bitregister,a slightly more complicated mltiplexer Mx, and two more clock cycles per mltiplication. The advantage is that only a two operand adder is needed that can be implemented with the ripple carry adder optimized for the Xilinx XC4000 series (see Section 3.3). Also we need only one carry instead of two between nits. The carry propagation delay of a 6 bit adder is eqivalent to only one additional CLB delay. The adder can be combined into the CLBs of the Add-Reg; we need therefore no additional CLBs. An additional register Add-Reg-2 allows storage of a mltiplication while a sqaring is compted and vice versa. The decoded control register signals and the a i, q i signals control the mltiplexers Mx B, Mx, Mx 2, Mx Res and the clock enables of the registers. N-Reg is loaded only when the modls is changed, B-Reg and B+N-Reg after each completion of Algorithm 2. Mx feeds 0, B, N or B + N into the adder according to the a i and q i bits. Mx 2 feeds N (for calclation of N +B)orthe, most significant bits of the reslt pls the least significant reslt bit of the next nit (division by two / shift right) back into the adder. Mx Res selects either the reslt of this nit or the one to the left to be stored into Reslt-Reg. Theoretically the implementation of the mltiplexers and decoders wold cost additional 4 +4CLBs. The possibility of re sing registers for combinatorial logic allows some savings of CLBs. Mx B and Mx Res are implemented in the CLBs of B-Reg and Reslt-Reg, Mx and Mx 2 partially in N-Reg and B+N-Reg. The resltingcostsare approximately 3+4 CLBs per bit processing nit. We compare this expense to the resorces needed for a one bit nit implementation. The B + N register wold not be needed, as a ripple carry adder for sch a small adder makes no sense. We wold need a total of seven bit register space (N, B, a i, q i, control(2) and reslt) and a 4-bit inpt 3 bit otpt (2 carries, reslt) adder. Together with one or two CLBs for decoding the control word and mltiplexing, we wold have a total of 6 or 7 CLBs per nit. A device that spports sch a 024 bit implementation wold need 6:5æ 0 3 to 7:5æ 0 3 CLBs, inclding overhead. 4.3 Modlar Mltiplication Figre 2 shows how the processing elements are connected to an array for compting an n bit modlar mltiplication. Starting at the rightmost nit 0, the control word, a i,andq i are fed into their registers. The adder comptes Add-Reg-2 pls B/N/B +N in one clock cycle according to N_In B_In q_i, a_i-in Carry_In Res_0_Ot Reslt_Ot Unit_(n/) Units_(n/-)..2 N_In Control_ot q_i, a_i-ot Carry_Ot Res_0_In Reslt_In B_In Unit_ N_Bs B_Bs q_i, a_i-in Carry_In Res_0_Ot Reslt_Ot N_In B_In Control_ot q_i, a_i-ot Carry_Ot a_i-in q_i-in Res_0_In Res_0_Ot Reslt_In Reslt_Ot Unit_0 Figre 2. Systolic Array for modlar mltiplication N_In B_In a_in Reslt_Ot a i and q i. The least significant bit of the reslt is read back as q i+ for the next comptation. The reslting carry bit, the control word, a i and q i are pmped into the nit to the left, where the same comptation takes place in the next clock cycle. In sch a systolic fashion the control word, a i, q i, and the carry bits are pmped from right to left throgh the whole nit array. The division by two in Algorithm 2 leads also to a shift right operation. The least significant bit of a nit s addition (Res 0 ) is always fed back into the nit to the right. After a modlar mltiplication is completed, the reslts are pmped from left to right throgh the nits and consectively stored in RAM for frther processing. A single processing element comptes bits of R i+ = èr i + a i æ B + q i æ N è=2 of Algorithm 2. In clock cycle i, nit 0 comptes bits 0 :::, of R i.incyclei +, nit ses the reslting carry and comptes bits :::2, of R i. Unit 0 ses the right shifted (division by 2) bit of R i (Res 0 ) to compte bits 0 :::, of R i+ in clock cycle i +2. Clock cycle i +is nprodctive in nit 0 while waiting for the reslt of nit. This inefficiency is avoided by compting sqares and mltiplications in parallel according to Algorithm 2. Both p i+ and z i+ depend on z i.wetherefore store the intermediate reslt z i in the B Registers and feed z i and p i into the a i inpt of the nits for sqaring and mltiplication. 4.4 Modlar Exponentiation Figre 3 shows how the array of nits is tilized for modlar exponentiation. First, the exponent E and the pre comptation factor 2 2n+6 mod N are read from I/O and stored into RAM (Exp and Prec). Then the modls N is read from I/O and fed on the bit wide N bs to the N registers of the nits. These steps have to be exected only if the system parameters need to be changed. Next we read the X vale from I/O, bits per clock cycle, and store it into the dal port (DP) RAM Z. Atthesame time the precomptation factor 2 2n+6 mod N is read from Prec RAM and fed bits per clock cycle via the B bs to the B registers of the nits. 5

6 X_In N_In Prec_In E_In N_In Units_(n/)...0 B_In a_i-in DP RAM X Reslt_Ot Shift X TDM Prec RAM DP RAM Z State machine Exp RAM Shift Z Figre 3. Design for a modlar exponentiation Exection of Algorithm begins in parallel to the reading of X. Initially we have P 0 = and Z 0 = X. First we mltiply both vales by the pre comptation factor 2 2n+6 mod N. This is done by time mltiplexing X and ; 0 :::0 in the time division mltiplexing nit (TDM), pmping the reslt as a i into the nits and mltiplying it by 2 2n+6 mod N that is already stored in the B registers. The reslts of the two pre comptations are stored into DP RAM Z and DP RAM P. Sqaring is now straightforward: The intermediate reslt Z i is always stored into the B registers and into DP RAM Z and fed via a i back into the nits. Mltiplication is done almost the same way. P i+ is always compted by feeding P i into the nits, bt the reslt is stored into DP RAM P only if the exponent e i is eqal to. In this way always the last stored P i is pmped back into the nits. To eliminate the factor 2 n+3 (see Section 4.3) from the reslt P n, we compte a final Montgomery mltiplication with inpts P n and. 0; 0;:::0; is stored via the B bs into the B registers, P n is fed from DP RAM P as a i into the nits. A fll modlar exponentiation is compted in 2èn + 2èèn+4èclock cycles. That is the delay it takes from inserting the first bits of X into the device ntil the first reslt bits appear at the otpt. At that point, another X vale can enter the device. With a latency of n= clock cycles the last bits appear on the otpt bs. 5 Methodology In or implementation we adopted the following design flow approach that reslted in fast verification of gate level netlists as well as back annotated designs:. Design entry 2. Logic verification 3. Synthesis 4. Place and Rote 5. Timing Verification The entire design, with the exception of vendor specific soft macros, was entered in VHDL format. Once the design was developed in VHDL, boolean logic and major timing errors were verified by simlating the gate level description with Synopsys VHDL analyzer (vhdlan) and VHDL debgger (vhdldbx) version The next step involved the synthesis of the VHDL code with Synopsys Design Compiler (fpga analyzer) version The otpt of this step was an optimized netlist describing the gate level design in XILINX format. The most time consming step was the compilation of the synthesized design with the place and rote tools available from Xilinx. This process was accomplished with the XILINX Design Manager tools version M.5.9. The final step of the design flow was to verify the design once again bt this time with the physical net, CLB, and pad delays introdced when the design was placed into a specific device. This was accomplished with the same test benches and simlation models that were sed dring the logic verification stage. Synopsys (vhdldbx) was sed once again to verify back-annotated designs. The timing reslts from Section 6 were all compted by the Xilinx timing analyzer and verified by the Synopsis vhdl debgger. They were not verified with an actal chip. 6 Reslts 6. Modlar Exponentiation We implemented or design for varios bit lengths and nit widths. Table shows or reslts in terms of sed CLBs (C), clock cycle time (T) and the time area prodct (TA). 256 bit 52 bit C T TA C T TA [CLBs] [ns] [CLB æ ns] [CLBs] [ns] [CLB æ ns] bit 024 bit C T TA C T TA [CLBs] [ns] [CLB æ ns] [CLBs] [ns] [CLB æ ns] Table. CLB sage, minimal clock cycle time, and time area prodct of modlar exponentiation architectres on Xilinx FPGAs 6

7 The majority of CLBs is expended in the nits. In Section 4.2 we derived an approximation of 3 +4CLBs per nit. The overhead consists mainly of RAM, dal port RAM, shift registers, conters and the state machine. An n bit RAM is implemented in n=32 CLBs, a dal port RAM in n=6 CLBs. Conters and their decoding for addressing RAM and dal port RAM are more costly for larger designs. On the other hand, we sed the same state machine for all designs in Table. The clock cycle time T in Table is the propagation delay from B-Reg throgh Mx and the carries of the adder to the registered carry, pls the setp time of the flip-flop. We compare this delay to the optimal cycle time calclated by the Xilinx timing analyzer; for a 4 bit nit the delay with optimal roting is 0.5ns (256 and 52 bit designs) and 2.7ns (768 and 024 bit designs); for an 8 bit nit.2ns and 3.7ns and for a 6 bit nit 2.8ns and 5.5ns. The larger designs were implemented in larger FPGA devices featring different delay specifications. Otherwise we expect the same cycle times for designs with the same nit size. The additional roting delay is between 50% and 80% above the optimal propagation delay. For designs p to 768 and 024 ( =4) bits it remains approximately constant; it deteriorates for 024 bit designs with nit sizes =8and = 6. The same can be said abot the place and rote time: we experienced rn times of a cople of hors on a AMD K6 2/300 MHz PC for designs p to 768 and 024 ( =4) bits, p to a week for the 024 ( =8and =6) bit designs. Different design methods, sch as hard macros for a single nit, wold probably improve roting delay and place and rote time. The time area prodct shows that designs with 8 bit nits are generally most efficient. 52 bit 768 bit 024 bit C T C T C T CLBs [ms] CLBs [ms] CLBs [ms] Table 2. CLB sage and exection time for a fll modlar exponentiation Table 2 shows the application of or reslts to pblic key schemes where the Chinese remainder theorem cannot be applied. A fll modlar exponentiation with an n bit exponent is compted in 2èn + 2èèn +4èclock cycles. 6.2 Application to RSA Table 3 shows or reslts from the tables above, applied to RSA. The encryption time is calclated for the F 4 exponent, reqiring 2 æ 9èn +4èclock cycles. Using the F 4 exponent, only one mltiplication can be calclated in parallel to a sqaring. 52 bit 024 bit C T C T CLBs [ms] CLBs [ms] Table 3. Application to RSA: Encryption For decryption we apply the Chinese remainder theorem. We either decrypt n bits with an n=2 bit architectre serially, or with two n=2 bit architectres in parallel. The first approach ses only half as many resorces, the later is twice as fast. 52 bit 52 bit 024 bit 024 bit 2 æ 256 serial 2 æ 256 parallel 2 æ 52 serial 2 æ 52 parallel C T C T C T C T CLBs [ms] CLBs [ms] CLBs [ms] CLBs [ms] Table 4. Application to RSA: Decryption 6.3 Comparison and Otlook We compare or fastest RSA 52/024 bit designs of Table 4 to the fastest soft- and hardware soltions we fond in the literatre [7, 3, 2]. Or 2.37ms decryption time is abot for times faster than the 52 bit software implementation (9.ms) on a 50MHz Alpha [3]. The fastest 024 bit software implementation [2] of 43.3ms rnning on a PPro 200 based PC is abot 4 times slower than or best reslt (0.2ms). The fastest reported hardware design [7] (.7ms for a 52 bit modls and 5.2ms for a 970 bit modls) is a factor.4/.7 faster than ors (9.ms for a 970 bit modls). A drawback of the soltion in [7] is, however, that the binary representation of the modls is hardwired into the logic representation so that the architectre has to be reconfigred with every new modls. The ser of sch an implementation needs to own the fll development tools for synthesis, placing and roting of FPGAs, if RSA with different modli shold be exected. Or design stores the modls, the exponent and the pre comptation factor in registers and RAM. A second advantage of or design is that it is implemented into one device instead of a matrix of 6 devices. Using crrently available FPGA technology, however, the design [7] wold probably also fit in a single device. 7

8 To improve or design in terms of speed, three approaches can be taken:. Comptation of one bit per processing nit (25% improvement estimated). 2. Montgomery mltiplication with a radix r =2 ; ç 2. Comptation of a fll modlar exponentiation in Oèn 2 =è cycles instead of Oèn 2 è. Both approaches have the major disadvantage that considerably more resorces will be sed. We will concentrate or ftre research on trying to implement a higher radix design according to approach 3). The challenge at hand is to accommodate simplifications as proposed in [6] to systolic array and FPGA technology. References [] J. Bajard, L. Didier, and P. Kornerp. An RNS Montgomery modlar mltiplication algorithm. IEEE Transactions on Compters, 47(7):766 76, Jly 998. [2] T. Beth and D. Gollmann. Algorithm engineering for pblic key algorithms. IEEE Jornal on Selected Areas in Commnications, 7(4):458 65, May 989. [3] E. Brickell. A srvey of hardware implementations of RSA. In Advances in Cryptology CRYPTO 89, pages Springer-Verlag, 990. [4] S. E. Eldridge and C. D. Walter. Hardware implementation of Montgomery s modlar mltiplication algorithm. IEEE Transactions on Compters, 42(6): , Jly 993. [5] W. Gai and H. Chen. A systolic linear array for modlar mltiplication. In 2nd International Conference on ASIC, pages 7 4, 996. [6] H.Orp. Simplifying qotient determination in high-radix modlar mltiplication. In Proceedings 2th Symposim on Compter Arithmetic, pages 93 9, 995. [7] K. Iwamra, T. Matsmoto, and H. Imai. Montgomery modlar-mltiplication method and systolic arrays sitable for modlar exponentiation. Electronics and Commnications in Japan, Part 3, 77(3):40 5, March 994. [8] D. Knth. The Art of Compter Programming. Volme 2: Seminmerical Algorithms. Addison-Wesley, Reading, Massachsetts, 2nd edition, 98. [9] P. Kornerp. A systolic, linear-array mltiplier for a class of right-shift algorithms. IEEE Transactions on Compters, 43(8):892 8, Agst 994. [0] P. Montgomery. Modlar mltiplication withot trial division. Mathematics of Comptation, 44(70):59 2, April 985. [] J. Qisqater and C. Covrer. Fast decipherment algorithm for RSA pblic key cryptosystem. Electronics Letters, 8:905 7, October 982. [2] R. Rivest, A. Shamir, and L. Adleman. A method for obtaining digital signatres and pblic key cryptosystems. Commnications of the ACM, 2(2):20 6, Feb [3] M. Shand and J. Villemin. Fast implementations of RSA cryptography. In Proceedings th IEEE Symposim on Compter Arithmetic, pages , 993. [4] D. R. Stinson. Cryptography, Theory and Practice. CRC Press, 995. [5] N. Takagi. A radix-4 modlar mltiplication hardware algorithm efficient for iterative modlar mltiplications. In Proceedings 0th IEEE Symposim on Compter Arithmetic, pages 35 42, 99. [6] A. Tiontchik. Systolic modlar exponentiation via Montgomery algorithm. Electronic Letters, 34(9):874 5, April 998. [7] J. Villemin, P. Bertin, D. Roncin, M. Shand, H. Toati, and P. Bocard. Programmable active memories: Reconfigrable systems come of age. IEEE Transactions on VLSI Systems, 4():56 69, Mar 996. [8] C. Walter. Fast modlar mltiplication sing 2-power radix. International Jornal of Compter Mathematics, 39( 2):2 8, 99. [9] C. Walter. Systolic modlar mltiplication. IEEE Transactions on Compters, 42(3):376 8, March 993. [20] P. Wang. New VLSI architectres of RSA pblic key cryptosystems. In Proceedings of 997 IEEE International Symposim on Circits and Systems, volme 3, pages , 997. [2] E. D. Win, S. Mister, B. Preneel, and M. Wiener. On the performance of signatre schemes based on elliptic crves. In Algorithmic Nmber Theory Symposim III, pages Springer-Verlag, 998. [22] Xilinx Inc., San Jose, CA. The Programmable Logic Data Book [23] J. Yong-Yin and W. Brleson. VLSI array algorithms and architectres for RSA modlar mltiplication. IEEE Transactions on VLSI Systems, 5(2):2 7, Jne

Chapter 4 (Part I) The Processor. Baback Izadi Division of Engineering Programs

Chapter 4 (Part I) The Processor. Baback Izadi Division of Engineering Programs EGC442 Introdction to Compter Architectre Chapter 4 (Part I) The Processor Baback Izadi Division of Engineering Programs bai@engr.newpaltz.ed Introdction CPU performance factors Instrction cont Determined

More information

Review: What is it? What does it do? slti $4, $5, 6

Review: What is it? What does it do? slti $4, $5, 6 Review: What is it? What does it do? Reg Src Instrction Instrction [3-] I [25-2] I [2-6] I [5 - ] 2 Src Op Reslt em em emtoreg I [5 - ] etend slti $, $5, 6 Reg Src Instrction Instrction [3-] I [25-2] I

More information

A Parallel Multilevel-Huffman Decompression Scheme for IP Cores with Multiple Scan Chains

A Parallel Multilevel-Huffman Decompression Scheme for IP Cores with Multiple Scan Chains A Parallel Mltilevel-Hffman Decompression Scheme for IP Cores with Mltiple Scan Chains X Kavosianos, E Kalligeros 2 and D Nikolos 2 Compter Science Dept, University of Ioannina, 45 Ioannina, Greece 2 Compter

More information

Analog Signal Input. ! Note: B.1 Analog Connections. Programming for Analog Channels

Analog Signal Input. ! Note: B.1 Analog Connections. Programming for Analog Channels B Analog Signal Inpt B.1 Analog Connections Refer to the diagram (page B-10) showing the VAN analog boards for connection of analog inpts. Be sre yo follow the indicated positive and negative polarity

More information

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath Objectives Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath In the previous chapters we have studied how to develop a specification from a given application, and

More information

Pipelining. Improve performance by increasing instruction throughput Program execution order. Data access. Instruction. fetch. Data access.

Pipelining. Improve performance by increasing instruction throughput Program execution order. Data access. Instruction. fetch. Data access. Chapter 6 Pipelining Improve performance by increasing instrction throghpt Program eection order Time (in instrctions) lw $, ($) Instrction fetch 2 4 6 8 2 4 6 8 ALU Data access lw $2, 2($) 8 ns Instrction

More information

An Efficient High Speed Wallace Tree Multiplier

An Efficient High Speed Wallace Tree Multiplier Chepuri satish,panem charan Arur,G.Kishore Kumar and G.Mamatha 38 An Efficient High Speed Wallace Tree Multiplier Chepuri satish, Panem charan Arur, G.Kishore Kumar and G.Mamatha Abstract: The Wallace

More information

MINIMED 640G SYSTEM^ Getting Started. WITH THE MiniMed 640G INSULIN PUMP

MINIMED 640G SYSTEM^ Getting Started. WITH THE MiniMed 640G INSULIN PUMP MINIMED 640G SYSTEM^ Getting Started WITH THE MiniMed 640G INSULIN PUMP let s get started! Table of Contents Section 1: Getting Started... 3 Getting Started with the MiniMed 640G Inslin Pmp...3 1.1 Pmp

More information

LUT Optimization for Memory Based Computation using Modified OMS Technique

LUT Optimization for Memory Based Computation using Modified OMS Technique LUT Optimization for Memory Based Computation using Modified OMS Technique Indrajit Shankar Acharya & Ruhan Bevi Dept. of ECE, SRM University, Chennai, India E-mail : indrajitac123@gmail.com, ruhanmady@yahoo.co.in

More information

Speech Recognition Combining MFCCs and Image Features

Speech Recognition Combining MFCCs and Image Features Speech Recognition Combining MFCCs and Image Featres S. Karlos from Department of Mathematics N. Fazakis from Department of Electrical and Compter Engineering K. Karanikola from Department of Mathematics

More information

Computer and Digital System Architecture

Computer and Digital System Architecture Compter and Digital Sytem Architectre EE/CpE-517-A Brce McNair mcnair@teven.ed Steven Intitte of Technology - All right reerved 4-1/65 Week 4 ARM organization and implementation Frer Ch. 4 Steven Intitte

More information

A Buyers Guide to Laser Projection

A Buyers Guide to Laser Projection The Eropean Digital Cinema Form A Byers Gide to Laser Projection AUTUMN 2018 Table of Contents Slides 2-5 Introdctory notes Slides 6-22 1: Technical Considerations Slides 23-31 2. Financial and lifetime

More information

A Real-time Framework for Video Time and Pitch Scale Modification

A Real-time Framework for Video Time and Pitch Scale Modification Dblin Institte of Technology ARROW@DIT Conference papers Adio Research Grop 2008-06-01 A Real-time Framework for Video Time and Pitch Scale Modification Ivan Damnjanovic Qeen Mary University London Dan

More information

A Fast Constant Coefficient Multiplier for the XC6200

A Fast Constant Coefficient Multiplier for the XC6200 A Fast Constant Coefficient Multiplier for the XC6200 Tom Kean, Bernie New and Bob Slous Xilinx Inc. Abstract. We discuss the design of a high performance constant coefficient multiplier on the Xilinx

More information

ALONG with the progressive device scaling, semiconductor

ALONG with the progressive device scaling, semiconductor IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 57, NO. 4, APRIL 2010 285 LUT Optimization for Memory-Based Computation Pramod Kumar Meher, Senior Member, IEEE Abstract Recently, we

More information

Why FPGAs? FPGA Overview. Why FPGAs?

Why FPGAs? FPGA Overview. Why FPGAs? Transistor-level Logic Circuits Positive Level-sensitive EECS150 - Digital Design Lecture 3 - Field Programmable Gate Arrays (FPGAs) January 28, 2003 John Wawrzynek Transistor Level clk clk clk Positive

More information

Investigation of Look-Up Table Based FPGAs Using Various IDCT Architectures

Investigation of Look-Up Table Based FPGAs Using Various IDCT Architectures Investigation of Look-Up Table Based FPGAs Using Various IDCT Architectures Jörn Gause Abstract This paper presents an investigation of Look-Up Table (LUT) based Field Programmable Gate Arrays (FPGAs)

More information

Novel Blind Recognition Algorithm of Frame Synchronization Words Based on Soft- Decision in Digital Communication Systems

Novel Blind Recognition Algorithm of Frame Synchronization Words Based on Soft- Decision in Digital Communication Systems RESEARCH ARTICLE Novel Blind Recognition Algorithm of Frame Synchronization Words Based on Soft- Decision in Digital Commnication Systems Jiangyi Qin*, Zhiping Hang, Chnw Li, Shaojing S, Jing Zho College

More information

High Performance Carry Chains for FPGAs

High Performance Carry Chains for FPGAs High Performance Carry Chains for FPGAs Matthew M. Hosler Department of Electrical and Computer Engineering Northwestern University Abstract Carry chains are an important consideration for most computations,

More information

E-Vision Laser 4K Series High Brightness Digital Video Projector

E-Vision Laser 4K Series High Brightness Digital Video Projector E-Vision Laser 4K Series High Brightness Digital Video Projector 4INSTALLATION AND QUICK-START GUIDE 4CONNECTION GUIDE 4OPERATING GUIDE 4REFERENCE GUIDE 118-157A Abot This Docment Follow the instrctions

More information

An Efficient Reduction of Area in Multistandard Transform Core

An Efficient Reduction of Area in Multistandard Transform Core An Efficient Reduction of Area in Multistandard Transform Core A. Shanmuga Priya 1, Dr. T. K. Shanthi 2 1 PG scholar, Applied Electronics, Department of ECE, 2 Assosiate Professor, Department of ECE Thanthai

More information

Designing for High Speed-Performance in CPLDs and FPGAs

Designing for High Speed-Performance in CPLDs and FPGAs Designing for High Speed-Performance in CPLDs and FPGAs Zeljko Zilic, Guy Lemieux, Kelvin Loveless, Stephen Brown, and Zvonko Vranesic Department of Electrical and Computer Engineering University of Toronto,

More information

Examples of FPLD Families: Actel ACT, Xilinx LCA, Altera MAX 5000 & 7000

Examples of FPLD Families: Actel ACT, Xilinx LCA, Altera MAX 5000 & 7000 Examples of FPL Families: Actel ACT, Xilinx LCA, Altera AX 5 & 7 Actel ACT Family ffl The Actel ACT family employs multiplexer-based logic cells. ffl A row-based architecture is used in which the logic

More information

Optimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015

Optimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015 Optimization of Multi-Channel BCH Error Decoding for Common Cases Russell Dill Master's Thesis Defense April 20, 2015 Bose-Chaudhuri-Hocquenghem (BCH) BCH is an Error Correcting Code (ECC) and is used

More information

LTC 8800 Series Allegiant Matrix/Control Systems - Modular

LTC 8800 Series Allegiant Matrix/Control Systems - Modular Video LTC 88 Series Allegiant Matrix/Control Systems - Modlar LTC 88 Series Allegiant Matrix/Control Systems - Modlar www.boschsecrity.com 5 Camera by 4 monitor switching Expandable to larger matrix sizes

More information

L11/12: Reconfigurable Logic Architectures

L11/12: Reconfigurable Logic Architectures L11/12: Reconfigurable Logic Architectures Acknowledgements: Materials in this lecture are courtesy of the following people and used with permission. - Randy H. Katz (University of California, Berkeley,

More information

Field Programmable Gate Arrays (FPGAs)

Field Programmable Gate Arrays (FPGAs) Field Programmable Gate Arrays (FPGAs) Introduction Simulations and prototyping have been a very important part of the electronics industry since a very long time now. Before heading in for the actual

More information

Reconfigurable Architectures. Greg Stitt ECE Department University of Florida

Reconfigurable Architectures. Greg Stitt ECE Department University of Florida Reconfigurable Architectures Greg Stitt ECE Department University of Florida How can hardware be reconfigurable? Problem: Can t change fabricated chip ASICs are fixed Solution: Create components that can

More information

FPGA Based Implementation of Convolutional Encoder- Viterbi Decoder Using Multiple Booting Technique

FPGA Based Implementation of Convolutional Encoder- Viterbi Decoder Using Multiple Booting Technique FPGA Based Implementation of Convolutional Encoder- Viterbi Decoder Using Multiple Booting Technique Dr. Dhafir A. Alneema (1) Yahya Taher Qassim (2) Lecturer Assistant Lecturer Computer Engineering Dept.

More information

L12: Reconfigurable Logic Architectures

L12: Reconfigurable Logic Architectures L12: Reconfigurable Logic Architectures Acknowledgements: Materials in this lecture are courtesy of the following sources and are used with permission. Frank Honore Prof. Randy Katz (Unified Microelectronics

More information

CDA 4253 FPGA System Design FPGA Architectures. Hao Zheng Dept of Comp Sci & Eng U of South Florida

CDA 4253 FPGA System Design FPGA Architectures. Hao Zheng Dept of Comp Sci & Eng U of South Florida CDA 4253 FPGA System Design FPGA Architectures Hao Zheng Dept of Comp Sci & Eng U of South Florida FPGAs Generic Architecture Also include common fixed logic blocks for higher performance: On-chip mem.

More information

CSE140L: Components and Design Techniques for Digital Systems Lab. CPU design and PLDs. Tajana Simunic Rosing. Source: Vahid, Katz

CSE140L: Components and Design Techniques for Digital Systems Lab. CPU design and PLDs. Tajana Simunic Rosing. Source: Vahid, Katz CSE140L: Components and Design Techniques for Digital Systems Lab CPU design and PLDs Tajana Simunic Rosing Source: Vahid, Katz 1 Lab #3 due Lab #4 CPU design Today: CPU design - lab overview PLDs Updates

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY Tarannum Pathan,, 2013; Volume 1(8):655-662 INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK VLSI IMPLEMENTATION OF 8, 16 AND 32

More information

Bit Swapping LFSR and its Application to Fault Detection and Diagnosis Using FPGA

Bit Swapping LFSR and its Application to Fault Detection and Diagnosis Using FPGA Bit Swapping LFSR and its Application to Fault Detection and Diagnosis Using FPGA M.V.M.Lahari 1, M.Mani Kumari 2 1,2 Department of ECE, GVPCEOW,Visakhapatnam. Abstract The increasing growth of sub-micron

More information

ENGG2410: Digital Design Lab 5: Modular Designs and Hierarchy Using VHDL

ENGG2410: Digital Design Lab 5: Modular Designs and Hierarchy Using VHDL ENGG2410: Digital Design Lab 5: Modular Designs and Hierarchy Using VHDL School of Engineering, University of Guelph Fall 2017 1 Objectives: Start Date: Week #7 2017 Report Due Date: Week #8 2017, in the

More information

Performance Evolution of 16 Bit Processor in FPGA using State Encoding Techniques

Performance Evolution of 16 Bit Processor in FPGA using State Encoding Techniques Performance Evolution of 16 Bit Processor in FPGA using State Encoding Techniques Madhavi Anupoju 1, M. Sunil Prakash 2 1 M.Tech (VLSI) Student, Department of Electronics & Communication Engineering, MVGR

More information

OF AN ADVANCED LUT METHODOLOGY BASED FIR FILTER DESIGN PROCESS

OF AN ADVANCED LUT METHODOLOGY BASED FIR FILTER DESIGN PROCESS IMPLEMENTATION OF AN ADVANCED LUT METHODOLOGY BASED FIR FILTER DESIGN PROCESS 1 G. Sowmya Bala 2 A. Rama Krishna 1 PG student, Dept. of ECM. K.L.University, Vaddeswaram, A.P, India, 2 Assistant Professor,

More information

EN2911X: Reconfigurable Computing Topic 01: Programmable Logic. Prof. Sherief Reda School of Engineering, Brown University Fall 2014

EN2911X: Reconfigurable Computing Topic 01: Programmable Logic. Prof. Sherief Reda School of Engineering, Brown University Fall 2014 EN2911X: Reconfigurable Computing Topic 01: Programmable Logic Prof. Sherief Reda School of Engineering, Brown University Fall 2014 1 Contents 1. Architecture of modern FPGAs Programmable interconnect

More information

12-bit Wallace Tree Multiplier CMPEN 411 Final Report Matthew Poremba 5/1/2009

12-bit Wallace Tree Multiplier CMPEN 411 Final Report Matthew Poremba 5/1/2009 12-bit Wallace Tree Multiplier CMPEN 411 Final Report Matthew Poremba 5/1/2009 Project Overview This project was originally titled Fast Fourier Transform Unit, but due to space and time constraints, the

More information

TKK S ASIC-PIIRIEN SUUNNITTELU

TKK S ASIC-PIIRIEN SUUNNITTELU Design TKK S-88.134 ASIC-PIIRIEN SUUNNITTELU Design Flow 3.2.2005 RTL Design 10.2.2005 Implementation 7.4.2005 Contents 1. Terminology 2. RTL to Parts flow 3. Logic synthesis 4. Static Timing Analysis

More information

CHAPTER 6 ASYNCHRONOUS QUASI DELAY INSENSITIVE TEMPLATES (QDI) BASED VITERBI DECODER

CHAPTER 6 ASYNCHRONOUS QUASI DELAY INSENSITIVE TEMPLATES (QDI) BASED VITERBI DECODER 80 CHAPTER 6 ASYNCHRONOUS QUASI DELAY INSENSITIVE TEMPLATES (QDI) BASED VITERBI DECODER 6.1 INTRODUCTION Asynchronous designs are increasingly used to counter the disadvantages of synchronous designs.

More information

RELATED WORK Integrated circuits and programmable devices

RELATED WORK Integrated circuits and programmable devices Chapter 2 RELATED WORK 2.1. Integrated circuits and programmable devices 2.1.1. Introduction By the late 1940s the first transistor was created as a point-contact device formed from germanium. Such an

More information

WINTER 15 EXAMINATION Model Answer

WINTER 15 EXAMINATION Model Answer Important Instructions to examiners: 1) The answers should be examined by key words and not as word-to-word as given in the model answer scheme. 2) The model answer and the answer written by candidate

More information

CAD for VLSI Design - I Lecture 38. V. Kamakoti and Shankar Balachandran

CAD for VLSI Design - I Lecture 38. V. Kamakoti and Shankar Balachandran 1 CAD for VLSI Design - I Lecture 38 V. Kamakoti and Shankar Balachandran 2 Overview Commercial FPGAs Architecture LookUp Table based Architectures Routing Architectures FPGA CAD flow revisited 3 Xilinx

More information

FPGA Implementation of Viterbi Decoder

FPGA Implementation of Viterbi Decoder Proceedings of the 6th WSEAS Int. Conf. on Electronics, Hardware, Wireless and Optical Communications, Corfu Island, Greece, February 16-19, 2007 162 FPGA Implementation of Viterbi Decoder HEMA.S, SURESH

More information

VHDL Design and Implementation of FPGA Based Logic Analyzer: Work in Progress

VHDL Design and Implementation of FPGA Based Logic Analyzer: Work in Progress VHDL Design and Implementation of FPGA Based Logic Analyzer: Work in Progress Nor Zaidi Haron Ayer Keroh +606-5552086 zaidi@utem.edu.my Masrullizam Mat Ibrahim Ayer Keroh +606-5552081 masrullizam@utem.edu.my

More information

Prototyping an ASIC with FPGAs. By Rafey Mahmud, FAE at Synplicity.

Prototyping an ASIC with FPGAs. By Rafey Mahmud, FAE at Synplicity. Prototyping an ASIC with FPGAs By Rafey Mahmud, FAE at Synplicity. With increased capacity of FPGAs and readily available off-the-shelf prototyping boards sporting multiple FPGAs, it has become feasible

More information

9 Programmable Logic Devices

9 Programmable Logic Devices Introduction to Programmable Logic Devices A programmable logic device is an IC that is user configurable and is capable of implementing logic functions. It is an LSI chip that contains a 'regular' structure

More information

HIGHlite 4K Series High Brightness Digital Video Projector

HIGHlite 4K Series High Brightness Digital Video Projector HIGHlite 4K Series High Brightness Digital Video Projector 4INSTALLATION AND QUICK-START GUIDE 4CONNECTION GUIDE 4OPERATING GUIDE 4REFERENCE GUIDE Rev A Febrary 2018 118-083A Abot This Docment Follow the

More information

Easy Estimation of Spectral Purity of Test Signals for ADC Testing. David Slepička

Easy Estimation of Spectral Purity of Test Signals for ADC Testing. David Slepička Sep. -4, 008, lorence, Italy Easy Estimation of Spectral Prity of Test Signals for ADC Testing David Slepička Czech Technical University in Prage, aclty of Electrical Engineering, Dept. of Measrement Technická,

More information

MODEL QUESTIONS WITH ANSWERS THIRD SEMESTER B.TECH DEGREE EXAMINATION DECEMBER CS 203: Switching Theory and Logic Design. Time: 3 Hrs Marks: 100

MODEL QUESTIONS WITH ANSWERS THIRD SEMESTER B.TECH DEGREE EXAMINATION DECEMBER CS 203: Switching Theory and Logic Design. Time: 3 Hrs Marks: 100 MODEL QUESTIONS WITH ANSWERS THIRD SEMESTER B.TECH DEGREE EXAMINATION DECEMBER 2016 CS 203: Switching Theory and Logic Design Time: 3 Hrs Marks: 100 PART A ( Answer All Questions Each carries 3 Marks )

More information

Computer Architecture and Organization

Computer Architecture and Organization A-1 Appendix A - Digital Logic Computer Architecture and Organization Miles Murdocca and Vincent Heuring Appendix A Digital Logic A-2 Appendix A - Digital Logic Chapter Contents A.1 Introduction A.2 Combinational

More information

Optimization of memory based multiplication for LUT

Optimization of memory based multiplication for LUT Optimization of memory based multiplication for LUT V. Hari Krishna *, N.C Pant ** * Guru Nanak Institute of Technology, E.C.E Dept., Hyderabad, India ** Guru Nanak Institute of Technology, Prof & Head,

More information

Comparative Analysis of Stein s. and Euclid s Algorithm with BIST for GCD Computations. 1. Introduction

Comparative Analysis of Stein s. and Euclid s Algorithm with BIST for GCD Computations. 1. Introduction IJCSN International Journal of Computer Science and Network, Vol 2, Issue 1, 2013 97 Comparative Analysis of Stein s and Euclid s Algorithm with BIST for GCD Computations 1 Sachin D.Kohale, 2 Ratnaprabha

More information

Design and Implementation of Encoder for (15, k) Binary BCH Code Using VHDL

Design and Implementation of Encoder for (15, k) Binary BCH Code Using VHDL Design and Implementation of Encoder for (15, k) Binary BCH Code Using VHDL K. Rajani *, C. Raju ** *M.Tech, Department of ECE, G. Pullaiah College of Engineering and Technology, Kurnool **Assistant Professor,

More information

Implementation of High Speed Adder using DLATCH

Implementation of High Speed Adder using DLATCH International Journal of Emerging Engineering Research and Technology Volume 3, Issue 12, December 2015, PP 162-172 ISSN 2349-4395 (Print) & ISSN 2349-4409 (Online) Implementation of High Speed Adder using

More information

Laboratory 1 - Introduction to Digital Electronics and Lab Equipment (Logic Analyzers, Digital Oscilloscope, and FPGA-based Labkit)

Laboratory 1 - Introduction to Digital Electronics and Lab Equipment (Logic Analyzers, Digital Oscilloscope, and FPGA-based Labkit) Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6. - Introductory Digital Systems Laboratory (Spring 006) Laboratory - Introduction to Digital Electronics

More information

Modeling Digital Systems with Verilog

Modeling Digital Systems with Verilog Modeling Digital Systems with Verilog Prof. Chien-Nan Liu TEL: 03-4227151 ext:34534 Email: jimmy@ee.ncu.edu.tw 6-1 Composition of Digital Systems Most digital systems can be partitioned into two types

More information

An Efficient Spurious Power Suppression Technique (SPST) and its Applications on MPEG-4 AVC/H.264 Transform Coding Design

An Efficient Spurious Power Suppression Technique (SPST) and its Applications on MPEG-4 AVC/H.264 Transform Coding Design An Efficient Sprios Sppression echniqe (SPS) and s Applications on PEG-4 AVC/H64 ransform Coding De Kan-Hng Chen, Ko-Chan Chao, Jinn-Shyan Wang, Yan-Sn Ch Department of Electrical Engineering, National

More information

Design And Implimentation Of Modified Sqrt Carry Select Adder On FPGA

Design And Implimentation Of Modified Sqrt Carry Select Adder On FPGA Design And Implimentation Of Modified Sqrt Carry Select Adder On FPGA Ch. Pavan kumar #1, V.Narayana Reddy, *2, R.Sravanthi *3 #Dept. of ECE, PBR VIT, Kavali, A.P, India #2 Associate.Proffesor, Department

More information

LFSRs as Functional Blocks in Wireless Applications Author: Stephen Lim and Andy Miller

LFSRs as Functional Blocks in Wireless Applications Author: Stephen Lim and Andy Miller XAPP22 (v.) January, 2 R Application Note: Virtex Series, Virtex-II Series and Spartan-II family LFSRs as Functional Blocks in Wireless Applications Author: Stephen Lim and Andy Miller Summary Linear Feedback

More information

Memory efficient Distributed architecture LUT Design using Unified Architecture

Memory efficient Distributed architecture LUT Design using Unified Architecture Research Article Memory efficient Distributed architecture LUT Design using Unified Architecture Authors: 1 S.M.L.V.K. Durga, 2 N.S. Govind. Address for Correspondence: 1 M.Tech II Year, ECE Dept., ASR

More information

Using Device-Specific Data Acquisition for Automated Laboratory Testing

Using Device-Specific Data Acquisition for Automated Laboratory Testing TRANSPOR'IATION RESEARCH RECORD 1432 9 Using Device-Specific Data Acqisition for Atomated Laboratory Testing THOMAS C. SHEAHAN, DON J. DEGROOT, AND JOHN T. GERMAINE Compter-based data acqisition systems

More information

Vadim V. Romanuke * (Professor, Polish Naval Academy, Gdynia, Poland)

Vadim V. Romanuke * (Professor, Polish Naval Academy, Gdynia, Poland) Electrical, Control and Commnication Engineering ISSN 2255-959 (online) ISSN 2255-940 (print) 20, vol. 4, no., pp. 5 57 doi: 0.247/ecce-20-0006 https://www.degryter.com/view/j/ecce An Attempt of Finding

More information

DESIGN O'F A HIGH SPEED DDA

DESIGN O'F A HIGH SPEED DDA DESIGN O'F A HIGH SPEED DDA Mark W. Goldman Gidance and Control Department Martin Company Baltimore, Maryland INTRODUCTION The objective of the company-fnded task which spported this work was to develop

More information

Field Communication FXA 675 Rackbus RS-485 Interface monorack II RS-485

Field Communication FXA 675 Rackbus RS-485 Interface monorack II RS-485 Technical Information TI 221F/00/en Field Commnication RS-485 Interface monorack II RS-485 For distribted control of Commtec transmitters and field transmitters with RS-485 interface 19" Racksyst plg-in

More information

Product Overview 2009

Product Overview 2009 Prodct Overview 2009 Living high tech 1 Contents Editorial...3 The new ECoS 4 The new ECoS - Jst Play...5 Fnctions detailed...7 Expandibility...9 ECoS 10 ECoS...10 Expandibility...11 Navigator 12 Eqipment

More information

ISSN:

ISSN: 427 AN EFFICIENT 64-BIT CARRY SELECT ADDER WITH REDUCED AREA APPLICATION CH PALLAVI 1, VSWATHI 2 1 II MTech, Chadalawada Ramanamma Engg College, Tirupati 2 Assistant Professor, DeptofECE, CREC, Tirupati

More information

Reconfigurable FPGA Implementation of FIR Filter using Modified DA Method

Reconfigurable FPGA Implementation of FIR Filter using Modified DA Method Reconfigurable FPGA Implementation of FIR Filter using Modified DA Method M. Backia Lakshmi 1, D. Sellathambi 2 1 PG Student, Department of Electronics and Communication Engineering, Parisutham Institute

More information

Testability: Lecture 23 Design for Testability (DFT) Slide 1 of 43

Testability: Lecture 23 Design for Testability (DFT) Slide 1 of 43 Testability: Lecture 23 Design for Testability (DFT) Shaahin hi Hessabi Department of Computer Engineering Sharif University of Technology Adapted, with modifications, from lecture notes prepared p by

More information

Design and Implementation of Partial Reconfigurable Fir Filter Using Distributed Arithmetic Architecture

Design and Implementation of Partial Reconfigurable Fir Filter Using Distributed Arithmetic Architecture Design and Implementation of Partial Reconfigurable Fir Filter Using Distributed Arithmetic Architecture Vinaykumar Bagali 1, Deepika S Karishankari 2 1 Asst Prof, Electrical and Electronics Dept, BLDEA

More information

CHAPTER 4 RESULTS & DISCUSSION

CHAPTER 4 RESULTS & DISCUSSION CHAPTER 4 RESULTS & DISCUSSION 3.2 Introduction This project aims to prove that Modified Baugh-Wooley Two s Complement Signed Multiplier is one of the high speed multipliers. The schematic of the multiplier

More information

FPGA Design. Part I - Hardware Components. Thomas Lenzi

FPGA Design. Part I - Hardware Components. Thomas Lenzi FPGA Design Part I - Hardware Components Thomas Lenzi Approach We believe that having knowledge of the hardware components that compose an FPGA allow for better firmware design. Being able to visualise

More information

Midterm Exam 15 points total. March 28, 2011

Midterm Exam 15 points total. March 28, 2011 Midterm Exam 15 points total March 28, 2011 Part I Analytical Problems 1. (1.5 points) A. Convert to decimal, compare, and arrange in ascending order the following numbers encoded using various binary

More information

3/5/2017. A Register Stores a Set of Bits. ECE 120: Introduction to Computing. Add an Input to Control Changing a Register s Bits

3/5/2017. A Register Stores a Set of Bits. ECE 120: Introduction to Computing. Add an Input to Control Changing a Register s Bits University of Illinois at Urbana-Champaign Dept. of Electrical and Computer Engineering ECE 120: Introduction to Computing Registers A Register Stores a Set of Bits Most of our representations use sets

More information

THE USE OF forward error correction (FEC) in optical networks

THE USE OF forward error correction (FEC) in optical networks IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 52, NO. 8, AUGUST 2005 461 A High-Speed Low-Complexity Reed Solomon Decoder for Optical Communications Hanho Lee, Member, IEEE Abstract

More information

An optimized implementation of 128 bit carry select adder using binary to excess-one converter for delay reduction and area efficiency

An optimized implementation of 128 bit carry select adder using binary to excess-one converter for delay reduction and area efficiency Journal From the SelectedWorks of Journal December, 2014 An optimized implementation of 128 bit carry select adder using binary to excess-one converter for delay reduction and area efficiency P. Manga

More information

A High- Speed LFSR Design by the Application of Sample Period Reduction Technique for BCH Encoder

A High- Speed LFSR Design by the Application of Sample Period Reduction Technique for BCH Encoder IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) ISSN: 239 42, ISBN No. : 239 497 Volume, Issue 5 (Jan. - Feb 23), PP 7-24 A High- Speed LFSR Design by the Application of Sample Period Reduction

More information

Read-only memory (ROM) Digital logic: ALUs Sequential logic circuits. Don't cares. Bus

Read-only memory (ROM) Digital logic: ALUs Sequential logic circuits. Don't cares. Bus Digital logic: ALUs Sequential logic circuits CS207, Fall 2004 October 11, 13, and 15, 2004 1 Read-only memory (ROM) A form of memory Contents fixed when circuit is created n input lines for 2 n addressable

More information

1. Basic safety information 4 2. Proper use 4

1. Basic safety information 4 2. Proper use 4 307041 01 EN Digital twilight switch LUNA 120 top2 1200100/ 1200200 1. Basic safety information 4 2. Proper se 4 Disposal 4 3. Installation and connection 5 Monting the time switch 5 Connecting the cable

More information

March 13, :36 vra80334_appe Sheet number 1 Page number 893 black. appendix. Commercial Devices

March 13, :36 vra80334_appe Sheet number 1 Page number 893 black. appendix. Commercial Devices March 13, 2007 14:36 vra80334_appe Sheet number 1 Page number 893 black appendix E Commercial Devices In Chapter 3 we described the three main types of programmable logic devices (PLDs): simple PLDs, complex

More information

770pp. THEORIA 64 (2009)

770pp. THEORIA 64 (2009) DOV M. GABBAY AND JOHN WOODS: The Rise of Modern Logic: From Leibniz to Frege. [Handbook of the History of Logic, vol. 3]. Elsevier North Holland, Amsterdam, 2004, 770pp. This volme contains essays on

More information

NH 67, Karur Trichy Highways, Puliyur C.F, Karur District UNIT-III SEQUENTIAL CIRCUITS

NH 67, Karur Trichy Highways, Puliyur C.F, Karur District UNIT-III SEQUENTIAL CIRCUITS NH 67, Karur Trichy Highways, Puliyur C.F, 639 114 Karur District DEPARTMENT OF ELETRONICS AND COMMUNICATION ENGINEERING COURSE NOTES SUBJECT: DIGITAL ELECTRONICS CLASS: II YEAR ECE SUBJECT CODE: EC2203

More information

An Efficient 64-Bit Carry Select Adder With Less Delay And Reduced Area Application

An Efficient 64-Bit Carry Select Adder With Less Delay And Reduced Area Application An Efficient 64-Bit Carry Select Adder With Less Delay And Reduced Area Application K Allipeera, M.Tech Student & S Ahmed Basha, Assitant Professor Department of Electronics & Communication Engineering

More information

DC Ultra. Concurrent Timing, Area, Power and Test Optimization. Overview

DC Ultra. Concurrent Timing, Area, Power and Test Optimization. Overview DATASHEET DC Ultra Concurrent Timing, Area, Power and Test Optimization DC Ultra RTL synthesis solution enables users to meet today s design challenges with concurrent optimization of timing, area, power

More information

128 BIT CARRY SELECT ADDER USING BINARY TO EXCESS-ONE CONVERTER FOR DELAY REDUCTION AND AREA EFFICIENCY

128 BIT CARRY SELECT ADDER USING BINARY TO EXCESS-ONE CONVERTER FOR DELAY REDUCTION AND AREA EFFICIENCY 128 BIT CARRY SELECT ADDER USING BINARY TO EXCESS-ONE CONVERTER FOR DELAY REDUCTION AND AREA EFFICIENCY 1 Mrs.K.K. Varalaxmi, M.Tech, Assoc. Professor, ECE Department, 1varuhello@Gmail.Com 2 Shaik Shamshad

More information

EEC 116 Fall 2011 Lab #5: Pipelined 32b Adder

EEC 116 Fall 2011 Lab #5: Pipelined 32b Adder EEC 116 Fall 2011 Lab #5: Pipelined 32b Adder Dept. of Electrical and Computer Engineering University of California, Davis Issued: November 2, 2011 Due: November 16, 2011, 4PM Reading: Rabaey Sections

More information

Design and Implementation of FPGA Configuration Logic Block Using Asynchronous Static NCL

Design and Implementation of FPGA Configuration Logic Block Using Asynchronous Static NCL Design and Implementation of FPGA Configuration Logic Block Using Asynchronous Static NCL Indira P. Dugganapally, Waleed K. Al-Assadi, Tejaswini Tammina and Scott Smith* Department of Electrical and Computer

More information

Design of Memory Based Implementation Using LUT Multiplier

Design of Memory Based Implementation Using LUT Multiplier Design of Memory Based Implementation Using LUT Multiplier Charan Kumar.k 1, S. Vikrama Narasimha Reddy 2, Neelima Koppala 3 1,2 M.Tech(VLSI) Student, 3 Assistant Professor, ECE Department, Sree Vidyanikethan

More information

UNIT III. Combinational Circuit- Block Diagram. Sequential Circuit- Block Diagram

UNIT III. Combinational Circuit- Block Diagram. Sequential Circuit- Block Diagram UNIT III INTRODUCTION In combinational logic circuits, the outputs at any instant of time depend only on the input signals present at that time. For a change in input, the output occurs immediately. Combinational

More information

LOW POWER AND HIGH PERFORMANCE SHIFT REGISTERS USING PULSED LATCH TECHNIQUE

LOW POWER AND HIGH PERFORMANCE SHIFT REGISTERS USING PULSED LATCH TECHNIQUE OI: 10.21917/ijme.2018.0088 LOW POWER AN HIGH PERFORMANCE SHIFT REGISTERS USING PULSE LATCH TECHNIUE Vandana Niranjan epartment of Electronics and Communication Engineering, Indira Gandhi elhi Technical

More information

Chapter 4. Logic Design

Chapter 4. Logic Design Chapter 4 Logic Design 4.1 Introduction. In previous Chapter we studied gates and combinational circuits, which made by gates (AND, OR, NOT etc.). That can be represented by circuit diagram, truth table

More information

Low Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Low Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Low Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No. # 29 Minimizing Switched Capacitance-III. (Refer

More information

Chapter 5 Sequential Circuits

Chapter 5 Sequential Circuits Logic and Computer Design Fundamentals Chapter 5 Sequential Circuits Part 2 Sequential Circuit Design Charles Kime & Thomas Kaminski 28 Pearson Education, Inc. (Hyperlinks are active in View Show mode)

More information

Design and Analysis of Modified Fast Compressors for MAC Unit

Design and Analysis of Modified Fast Compressors for MAC Unit Design and Analysis of Modified Fast Compressors for MAC Unit Anusree T U 1, Bonifus P L 2 1 PG Student & Dept. of ECE & Rajagiri School of Engineering & Technology 2 Assistant Professor & Dept. of ECE

More information

Adding Analog and Mixed Signal Concerns to a Digital VLSI Course

Adding Analog and Mixed Signal Concerns to a Digital VLSI Course Session Number 1532 Adding Analog and Mixed Signal Concerns to a Digital VLSI Course John A. Nestor and David A. Rich Department of Electrical and Computer Engineering Lafayette College Abstract This paper

More information

FPGA Hardware Resource Specific Optimal Design for FIR Filters

FPGA Hardware Resource Specific Optimal Design for FIR Filters International Journal of Computer Engineering and Information Technology VOL. 8, NO. 11, November 2016, 203 207 Available online at: www.ijceit.org E-ISSN 2412-8856 (Online) FPGA Hardware Resource Specific

More information

LUT Design Using OMS Technique for Memory Based Realization of FIR Filter

LUT Design Using OMS Technique for Memory Based Realization of FIR Filter International Journal of Emerging Engineering Research and Technology Volume. 2, Issue 6, September 2014, PP 72-80 ISSN 2349-4395 (Print) & ISSN 2349-4409 (Online) LUT Design Using OMS Technique for Memory

More information

Enhancing Performance in Multiple Execution Unit Architecture using Tomasulo Algorithm

Enhancing Performance in Multiple Execution Unit Architecture using Tomasulo Algorithm Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 6.017 IJCSMC,

More information

Design for Testability

Design for Testability TDTS 01 Lecture 9 Design for Testability Zebo Peng Embedded Systems Laboratory IDA, Linköping University Lecture 9 The test problems Fault modeling Design for testability techniques Zebo Peng, IDA, LiTH

More information