Montgomery Modular Exponentiation on Reconfigurable Hardware æ


 Colin Martin
 9 months ago
 Views:
Transcription
1 Montgomery Modlar Exponentiation on Reconfigrable Hardware æ Thomas Blm Worcester Polytechnic Institte ECE Department Worcester, MA , USA Christof Paar Abstract It is widely recognized that secrity isses will play a crcial role in the majority of ftre compter and commnication systems. Central tools for achieving system secrity are cryptographic algorithms. For performance as well as for physical secrity reasons, it is often advantageos to realize cryptographic algorithms in hardware. In order to overcome the wellknown drawback of redced flexibility that is associated with traditional ASIC soltions, this contribtion proposes arithmetic architectres which are optimized for modern field programmable gate arrays (FPGAs). The proposed architectres perform modlar exponentiation with very long integers. This operation is at the heart of many practical pblickey algorithms sch as RSA and discrete logarithm schemes. We combine the Montgomery modlar mltiplication algorithm with a new systolic array design, which is capable of processing a variable nmber of bits per array cell. The designs are flexible, allowing any choice of operand and modls. Unlike previos approaches, we systematically implement and compare several variants of or new architectre for different bit lengths. We provide absolte area and timing measres for each architectre. The reslts allow conclsions abot the feasibility and timespace tradeoffs of or architectre for implementation on Xilinx XC4000 series FPGAs. As a major practical reslt we show that it is possible to implement modlar exponentiation at secre bit lengths on a single commercially available FPGA. Introdction It is widely recognized that secrity isses will play a crcial role in many ftre compter and commnication systems. A central tool for achieving system secrity are æ The research was spported in part throgh an NSF CAREER award #CCR cryptographic algorithms. For performance as well as for physical secrity reasons it is often reqired to realize cryptographic algorithms in hardware. Traditional ASIC soltions, however, have the wellknown drawback of redced flexibility compared to software soltions. Since modern secrity protocols are increasingly defined to be algorithm independent, a high degree of flexibility with respect to the cryptographic algorithms is desirable. A promising soltion which combines high flexibility with the speed and physical secrity of traditional hardware is the implementation of cryptographic algorithms on reconfigrable devices sch as FPGAs and EPLDs. In the case of pblickey schemes, algorithm independence can mean not only a change of the actal crypto algorithm bt also change of parameters sch as bit length, modls, or exponents. This contribtion deals with arithmetic architectres for modlar exponentiation with very long integers which is at the heart of most modern pblickey schemes. Most notably, both RSA and discrete logarithmbased (e.g., DiffieHellman key exchange or the Digital Signatre Algorithm, DSA) schemes reqire modlar long nmber exponentiation. The challenge at hand is to design sch arithmetic architectres for operands with p to 024 bit on crrent FPGAs. The very long word lengths prohibit the application of many proposed architectres as they wold reslt in nrealistically large resorce reqirements. In this contribtion we derive a modlar exponentiation architectre which combines Montgomery s modlar redction scheme and a novel systolic array architectre. The systolic array architectre reqires considerably fewer logic resorces than many other systolic array architectres for modlar arithmetic. This is crcial, as one of or goals was to derive soltions that can fit into a single FPGA. Clearly a design which fits in a single FPGA has many cost and design advantages over mlti FPGA soltions. Another important objective was to systematically implement varios architectre options for different bit lengths. This contribtion is strctred as follows. In Section 2, we smmarize some of the previos work on modlar ex
2 ponentiation. Section 3 describes algorithms for modlar exponentiation and mltiplication and some simplifications and speedps for their hardware implementation. In this section we also describe some of the relevant featres of the Xilinx XC4000 FPGA series. Section 4 otlines or architectre for modlar exponentiation. Section 5 briefly describes or methodology and tools that were sed for this research. Section 6 of this contribtion posts the timing and area reslts obtained. A comparison to other architectres and an otlook conclde this contribtion. 2 Previos Work In the following, we will smmarize relevant previos work in the field of modlar mltiplication. Most proposed approaches are based on Montgomery s algorithm [0], either in conjnction with a redndant nmber representation or in an systolic array architectre. Soltions sing other algorithms have also been presented. To avoid the carry propagation in mltiplication/addition architectres several soltions have been proposed in the literatre. They either se Montgomery s algorithm, in combination with a redndant radix nmber system [5, 3, 7, 4, 8, 6] or the Reside Nmber System []. The Research Laboratory of Digital Eqipment Corp. in Paris implemented modlar exponentiation in architectres on FPGAs [7, 3]. They tilized an array of 6 XILINX 3090 FPGAs. Compared to XILINX 4000 series in terms of flip flops, this is eqivalent to a chip with 500 configrable logic blocks (CLBs). In terms of logic resorces this is eqivalent to a chip of 4000 CLBs. In their work they sed several speedp methods [3] inclding the Chinese remainder theorem, an asynchronos carry completion adder, and a windowing method. The implementation comptes a 970bit RSA decryption at a rate of 85kb/s (5.2ms per 970 bit decryption) and a 52 bit RSA decryption in excess of 300 kb/s (.7ms per 52 bit decryption). A drawback of this soltion is that the binary representation of the modls is hardwired into the logic representation so that the architectre has to be reconfigred with every new modls. There has been a nmber of proposals for systolic array architectres for modlar arithmetic. However, no implementations have been reported to or knowledge. In [5] a VLSI soltion is presented where a modlar mltiplication is calclated in è4n +èæ 3n=2 clock cycles (n is the nmber of bits of the modls). That is approximately for times more cycles than in a conventional soltion. In terms of resorces this design wold be sitable for FPGA. Similar twodimensional systolic arrays are presented in [7, 9, 20, 6]. For a radix of two they all propose an n æ n matrix of one bit processing elements. With this configration 2n modlar mltiplications are calclated at the same time and the theoretical throghpt is one modlar mltiplication per clock cycle. In terms of resorces, sch a soltion is not feasible in either VLSI or FPGA for the bit length reqired in pblickey algorithms. Even implementing only onerowofprocessingelements, (resltinginn times slower throghpt) into presently available FPGAs is difficlt in terms of resorces. We tried to overcome the shortage of resorces per chip by sing larger processing elements and ths saving overhead. Reference [2] provides a good overview of previosly presented architectres for VLSI implementations of modlar integer arithmetic. Reference [3] smmarizes the chips available in 990 for performing RSA encryption. More recently an approach [23] has been presented that tilizes precompted complements of the modls and is based on the iterative Horner s rle. Compared to Montgomery s algorithms these approaches se the most significant bits of an intermediate reslt to decide which mltiples of the modls to sbtract. The drawback of these soltions is that they either need a large amont of storage space or many clock cycles to complete a modlar mltiplication. The athors attempted to overcome the later problem by a higher clock freqency which is possible de to a simplified modlo redction operation. 3 Preliminaries 3. Modlar Exponentiation and RSA We start this section with a short description of the RSA algorithm, proposed by Rivest, Shamir and Adleman [2] in 978. The algorithm is based on modlar exponentiation of integers. The private key of a ser consists of two large primes p and q and an exponent D. The pblic key consists of the modls N = p æ q and an exponent E sch that E = D, mod èp, èèq, è. In the remainder of the article we always assme that N can be represented by n bits. To encrypt a message X the ser comptes: Y = X E mod N Decryption is done by calclating: X = Y D mod N The identical operations are tilized for the RSA digital signatre scheme. In order to thwart crrently known attacks, the modls N and ths X and Y shold have a length of bits. Both encryption and decryption reqire algorithms for compting a modlar exponentiation. This can be realized by sing the sqare and mltiply algorithm [4]. To compte sqaring and mltiplication in parallel we can se the following version [20]: 2
3 Algorithm : comptes P = X E mod N, where E = P n, i=0 e i2 i, e i 2f0; g. P 0 =, Z 0 = X 2. for i = 0 to n, do 3. Z i+ = Z 2 i mod N 4. if e i =then P i+ = P i æ Z i mod N Algorithm takes 2n operations in the worst case and :5n on average. For speeding p encryption the se of a short exponent E has been proposed [8]. Recommended by ITU is the the Fermat prime F 4 = Using F 4, the encryption is exected in only 7 operations. Other short exponents proposed inclde E =3and E =7. Obviosly the same trick can not be sed for decryption, as the decryption exponent D mst be kept secret. Bt sing the knowledge of the factors of N = q æ p, thechinese Remainder Theorem [] can be applied by the decrypting party. Two n=2 size modlar exponentiations and an additional recombination instead of one n size modlar exponentiations are compted in this case. Each modlar exponentiation of length n=2 takes =4 of the time reqired for an n bit exponentiation. If both exponentiations are performed serially, an over all speed p factor of two is achieved. If they are performed in parallel, a speed p factor of for is achieved. 3.2 Montgomery Modlar Mltiplication As shown in the previos section, modlar exponentiation is redced to a series of modlar mltiplications and sqarings. The algorithm for modlar mltiplication described below has been proposed by P. L. Montgomery in 985 [0]. Several optimizations were taken from reference [9]: Algorithm 2: Montgomery Modlar Mltiplication (radix 2) for compting A æ B mod N, where B = P P n+ b i=0 i2 i n+2, b i 2f0; g, b 0 = 0, A = a i=0 i2 i, a i 2 f0; g, a n+ =0, a n+2 =0. R 0 =0 2. for i =0to n +2do 3. q i = R i è0è 4. R i+ =èr i + a i æ B + q i æ N è=2 B is shifted p one bit with b 0 =0. This measre simplifies the comptation of q i, compared to the original algorithm. The loop of Algorithm 2 is exected three more times than originally proposed. With this step we make sre the ineqalities R i é 3N and R n+3 é 2N always hold. The reslt of a modlar mltiplication R n+3 can ths be resed as inpt A and B for the next mltiplication. We avoid the originally proposed final comparison and sbtraction and make a pipelined exection of the algorithm possible. A precondition for the algorithm to work is that the modls N has to be relatively prime to the radix. In RSA this is always satisfied as N is a mltiple of two primes and therefore odd. The algorithm above calclates R n = è2,n,3 ABè modn. To get the right reslt we need an extra Montgomery modlar mltiplication by 2 2n+6 mod N. However if frther mltiplications are reqired as for exponentiation it is better to pre mltiply all inpts by the factor 2 2n+6 mod N. Ths every intermediate reslt carries a factor 2 n+3. We jst need to Montgomery mltiply the reslt bytoeliminatethatfactor. The final Montgomery mltiplication with makes sre or final reslt is smaller than N. Consider Algorithm 2 with Bé 4N (B shifted p) and A =è0;:::;0; è. We will get R = B=2 é 2N. As all remaining a i =0,we getatmostr i+ =èr i + N è=2! N. If only one q i =0 èi =; 2 :::n+2è,thenr i+ = R i =2 én(probability:, 2,èn+2è ). The whole comptational complexity of Algorithm 2 lies in the three additions of n bit operands for compting R i+. As the propagation of n carries is too slow and an eqivalent carry look ahead logic reqires too many resorces, two different strategies have been prsed in the literatre:. Redndant representation: The intermediate reslts are kept in redndant form. Resoltion into binary representation is only done at the very end and for feeding the intermediate reslt back as a i in Algorithm Systolic Arrays: n processing nits calclate bit per clock cycle. The compted carries, q i and a i are pmped throgh the processing nits. As these signals have to be distribted only between adjacent processing nits, a faster clock speed and a reslting higher throghpt shold be possible. The cost is a higher latency and possibly more resorces. 3.3 Xilinx XC4000 Series FPGAs In this section we present some of the relevant featres of the Xilinx XC4000 Series FPGAs and introdce a metric for FPGA cost and performance evalation. An FPGA device consists of three types of reconfigrable elements, the Configrable Logic Blocks (CLBs), I/O blocks (IOBs) and roting resorces [22]. An XC4000 CLB is made p of 3 look p tables, two flipflops and programmable mltiplexers. Any boolean fnction of 5 inpts, 3
4 any 2 fnctions of 4 inpts and some fnctions of p to 9 inpts can be compted in one CLB. The mltiplexers can rote these signals directly to the otpts or to the flipflops. In the first case the flipflops can be tilized to store direct inpts. Programmable roting resorces connect the CLBs and IOBs into a network. For signal distribtion all over the device there are 8 global nets available. Another featre of the CLB is its dedicated hardware to accelerate the carry path of adders and conters [22]. An n bit ripple carry adder is implemented in n=2 +2CLBs. As the carry signal ses dedicated interconnects, there is no roting delay in the path and the total delay is fixed: t pd =4:5+n æ 0:35 ënsë. On chip RAM redces the cost of data storage. A single CLB can be sed for a 6 æ 2 bit or 32 æ bit ROM/RAM or for a 6 æ bit Dal Port RAM. In previos work [20, 9, 4] the gate cont model has been sed for cost evalation and the gate delay model for speed evalation. This is not appropriate for FPGAs. As the fnctional nit of an FPGA is the CLB, we evalate the cost (C) in nmber of CLBs. The operation time (T) consists of logic delay in the CLBs and roting delay and is obtained from Xilinx s Timing Analyzer software. As a third parameter we se the time area prodct (TA). It is defined by time mltiplied by cost. 4 A New Architectre 4. Design Overview As described in Section 3.2, there have been two principle approaches proposed to compte Montgomery modlar mltiplication. A soltion following approach has already been implemented in FPGA [7]. The second approach sing systolic arrays has drawn considerable attention in the research commnity. However, no architectres that specifically target FPGAs have been reported, nor are there reports of implementations of sch systolic architectres. Or contribtion targets these two goals. Or system can be divided hierarchically into three levels.. Processing Element: Compte bits of a modlar mltiplication. 2. Modlar Mltiplication: An array of processing elements comptes a modlar mltiplication. 3. Modlar Exponentiation: Combine modlar mltiplications to modlar exponentiation according to Algorithm. In the following we describe the system with a bottom p approach. 4.2 Processing Elements A general radix 2 systolic array as proposed in [7, 9, 6, 5] tilizes n times n processing elements. As this approach wold reslt in nrealistically large CLB conts for the bit length reqired in modern pblic key schemes, we implemented only one row of processing elements. To frther redce the reqired nmber of CLBs we implemented processing elements (nits) of =4,8,6 bits. Withthisapproach we need onlyn= instead of n processing elements, and a considerable amont of overhead can be saved. Similar to the approach in [9] we compte sqarings and mltiplications of Algorithm in parallel. As explained in Section 4.3, this measre flly tilizes every cycle. Mx_B B_Reg "0" B+N_Reg N_Reg B_In N_In Res_0_In Mx_ bit Adder + Add_Reg  Add_Reg_2 2 "0" Mx_2 Control  Figre. Processing Element (nit) Decode Mx_Res Control_Reg q_i, a_ireg Reslt_Reg Control_Ot q_i, a_iin 2 q_i, a_iot Carry_In Reslt_Ot Reslt_In Carry_Ot Res_0_Ot In the processing elements we need the following registers: æ NReg ( bits): storage of the modls æ BReg ( bits): storage of the B mltiplier æ B+NReg ( bits): storage of the intermediate reslt B + N æ AddReg ( +bits): storage of the intermediate reslt æ AddReg2 (, bits): storage of the intermediate reslt æ ControlReg (3 bits): control of the mltiplexers and clock enables æ a i,q i (2 bits): mltiplier A, qotient Q, according to Algorithm 2 æ ResltReg ( bits): storage of the reslt at the end of a mltiplication 4
5 The registers need a total of è6 +5è=2 CLBs. Instead of compting èr + a i æ B + q i æ N è=2 in each iteration, we compte N + B once and store the reslt in the B+NReg. Mltiplexer Mx selects one of its inpts 0, N, B, B + N to be added to R according to the vale of the binary variables a i and q i. The additional cost is a bitregister,a slightly more complicated mltiplexer Mx, and two more clock cycles per mltiplication. The advantage is that only a two operand adder is needed that can be implemented with the ripple carry adder optimized for the Xilinx XC4000 series (see Section 3.3). Also we need only one carry instead of two between nits. The carry propagation delay of a 6 bit adder is eqivalent to only one additional CLB delay. The adder can be combined into the CLBs of the AddReg; we need therefore no additional CLBs. An additional register AddReg2 allows storage of a mltiplication while a sqaring is compted and vice versa. The decoded control register signals and the a i, q i signals control the mltiplexers Mx B, Mx, Mx 2, Mx Res and the clock enables of the registers. NReg is loaded only when the modls is changed, BReg and B+NReg after each completion of Algorithm 2. Mx feeds 0, B, N or B + N into the adder according to the a i and q i bits. Mx 2 feeds N (for calclation of N +B)orthe, most significant bits of the reslt pls the least significant reslt bit of the next nit (division by two / shift right) back into the adder. Mx Res selects either the reslt of this nit or the one to the left to be stored into ResltReg. Theoretically the implementation of the mltiplexers and decoders wold cost additional 4 +4CLBs. The possibility of re sing registers for combinatorial logic allows some savings of CLBs. Mx B and Mx Res are implemented in the CLBs of BReg and ResltReg, Mx and Mx 2 partially in NReg and B+NReg. The resltingcostsare approximately 3+4 CLBs per bit processing nit. We compare this expense to the resorces needed for a one bit nit implementation. The B + N register wold not be needed, as a ripple carry adder for sch a small adder makes no sense. We wold need a total of seven bit register space (N, B, a i, q i, control(2) and reslt) and a 4bit inpt 3 bit otpt (2 carries, reslt) adder. Together with one or two CLBs for decoding the control word and mltiplexing, we wold have a total of 6 or 7 CLBs per nit. A device that spports sch a 024 bit implementation wold need 6:5æ 0 3 to 7:5æ 0 3 CLBs, inclding overhead. 4.3 Modlar Mltiplication Figre 2 shows how the processing elements are connected to an array for compting an n bit modlar mltiplication. Starting at the rightmost nit 0, the control word, a i,andq i are fed into their registers. The adder comptes AddReg2 pls B/N/B +N in one clock cycle according to N_In B_In q_i, a_iin Carry_In Res_0_Ot Reslt_Ot Unit_(n/) Units_(n/)..2 N_In Control_ot q_i, a_iot Carry_Ot Res_0_In Reslt_In B_In Unit_ N_Bs B_Bs q_i, a_iin Carry_In Res_0_Ot Reslt_Ot N_In B_In Control_ot q_i, a_iot Carry_Ot a_iin q_iin Res_0_In Res_0_Ot Reslt_In Reslt_Ot Unit_0 Figre 2. Systolic Array for modlar mltiplication N_In B_In a_in Reslt_Ot a i and q i. The least significant bit of the reslt is read back as q i+ for the next comptation. The reslting carry bit, the control word, a i and q i are pmped into the nit to the left, where the same comptation takes place in the next clock cycle. In sch a systolic fashion the control word, a i, q i, and the carry bits are pmped from right to left throgh the whole nit array. The division by two in Algorithm 2 leads also to a shift right operation. The least significant bit of a nit s addition (Res 0 ) is always fed back into the nit to the right. After a modlar mltiplication is completed, the reslts are pmped from left to right throgh the nits and consectively stored in RAM for frther processing. A single processing element comptes bits of R i+ = èr i + a i æ B + q i æ N è=2 of Algorithm 2. In clock cycle i, nit 0 comptes bits 0 :::, of R i.incyclei +, nit ses the reslting carry and comptes bits :::2, of R i. Unit 0 ses the right shifted (division by 2) bit of R i (Res 0 ) to compte bits 0 :::, of R i+ in clock cycle i +2. Clock cycle i +is nprodctive in nit 0 while waiting for the reslt of nit. This inefficiency is avoided by compting sqares and mltiplications in parallel according to Algorithm 2. Both p i+ and z i+ depend on z i.wetherefore store the intermediate reslt z i in the B Registers and feed z i and p i into the a i inpt of the nits for sqaring and mltiplication. 4.4 Modlar Exponentiation Figre 3 shows how the array of nits is tilized for modlar exponentiation. First, the exponent E and the pre comptation factor 2 2n+6 mod N are read from I/O and stored into RAM (Exp and Prec). Then the modls N is read from I/O and fed on the bit wide N bs to the N registers of the nits. These steps have to be exected only if the system parameters need to be changed. Next we read the X vale from I/O, bits per clock cycle, and store it into the dal port (DP) RAM Z. Atthesame time the precomptation factor 2 2n+6 mod N is read from Prec RAM and fed bits per clock cycle via the B bs to the B registers of the nits. 5
6 X_In N_In Prec_In E_In N_In Units_(n/)...0 B_In a_iin DP RAM X Reslt_Ot Shift X TDM Prec RAM DP RAM Z State machine Exp RAM Shift Z Figre 3. Design for a modlar exponentiation Exection of Algorithm begins in parallel to the reading of X. Initially we have P 0 = and Z 0 = X. First we mltiply both vales by the pre comptation factor 2 2n+6 mod N. This is done by time mltiplexing X and ; 0 :::0 in the time division mltiplexing nit (TDM), pmping the reslt as a i into the nits and mltiplying it by 2 2n+6 mod N that is already stored in the B registers. The reslts of the two pre comptations are stored into DP RAM Z and DP RAM P. Sqaring is now straightforward: The intermediate reslt Z i is always stored into the B registers and into DP RAM Z and fed via a i back into the nits. Mltiplication is done almost the same way. P i+ is always compted by feeding P i into the nits, bt the reslt is stored into DP RAM P only if the exponent e i is eqal to. In this way always the last stored P i is pmped back into the nits. To eliminate the factor 2 n+3 (see Section 4.3) from the reslt P n, we compte a final Montgomery mltiplication with inpts P n and. 0; 0;:::0; is stored via the B bs into the B registers, P n is fed from DP RAM P as a i into the nits. A fll modlar exponentiation is compted in 2èn + 2èèn+4èclock cycles. That is the delay it takes from inserting the first bits of X into the device ntil the first reslt bits appear at the otpt. At that point, another X vale can enter the device. With a latency of n= clock cycles the last bits appear on the otpt bs. 5 Methodology In or implementation we adopted the following design flow approach that reslted in fast verification of gate level netlists as well as back annotated designs:. Design entry 2. Logic verification 3. Synthesis 4. Place and Rote 5. Timing Verification The entire design, with the exception of vendor specific soft macros, was entered in VHDL format. Once the design was developed in VHDL, boolean logic and major timing errors were verified by simlating the gate level description with Synopsys VHDL analyzer (vhdlan) and VHDL debgger (vhdldbx) version The next step involved the synthesis of the VHDL code with Synopsys Design Compiler (fpga analyzer) version The otpt of this step was an optimized netlist describing the gate level design in XILINX format. The most time consming step was the compilation of the synthesized design with the place and rote tools available from Xilinx. This process was accomplished with the XILINX Design Manager tools version M.5.9. The final step of the design flow was to verify the design once again bt this time with the physical net, CLB, and pad delays introdced when the design was placed into a specific device. This was accomplished with the same test benches and simlation models that were sed dring the logic verification stage. Synopsys (vhdldbx) was sed once again to verify backannotated designs. The timing reslts from Section 6 were all compted by the Xilinx timing analyzer and verified by the Synopsis vhdl debgger. They were not verified with an actal chip. 6 Reslts 6. Modlar Exponentiation We implemented or design for varios bit lengths and nit widths. Table shows or reslts in terms of sed CLBs (C), clock cycle time (T) and the time area prodct (TA). 256 bit 52 bit C T TA C T TA [CLBs] [ns] [CLB æ ns] [CLBs] [ns] [CLB æ ns] bit 024 bit C T TA C T TA [CLBs] [ns] [CLB æ ns] [CLBs] [ns] [CLB æ ns] Table. CLB sage, minimal clock cycle time, and time area prodct of modlar exponentiation architectres on Xilinx FPGAs 6
7 The majority of CLBs is expended in the nits. In Section 4.2 we derived an approximation of 3 +4CLBs per nit. The overhead consists mainly of RAM, dal port RAM, shift registers, conters and the state machine. An n bit RAM is implemented in n=32 CLBs, a dal port RAM in n=6 CLBs. Conters and their decoding for addressing RAM and dal port RAM are more costly for larger designs. On the other hand, we sed the same state machine for all designs in Table. The clock cycle time T in Table is the propagation delay from BReg throgh Mx and the carries of the adder to the registered carry, pls the setp time of the flipflop. We compare this delay to the optimal cycle time calclated by the Xilinx timing analyzer; for a 4 bit nit the delay with optimal roting is 0.5ns (256 and 52 bit designs) and 2.7ns (768 and 024 bit designs); for an 8 bit nit.2ns and 3.7ns and for a 6 bit nit 2.8ns and 5.5ns. The larger designs were implemented in larger FPGA devices featring different delay specifications. Otherwise we expect the same cycle times for designs with the same nit size. The additional roting delay is between 50% and 80% above the optimal propagation delay. For designs p to 768 and 024 ( =4) bits it remains approximately constant; it deteriorates for 024 bit designs with nit sizes =8and = 6. The same can be said abot the place and rote time: we experienced rn times of a cople of hors on a AMD K6 2/300 MHz PC for designs p to 768 and 024 ( =4) bits, p to a week for the 024 ( =8and =6) bit designs. Different design methods, sch as hard macros for a single nit, wold probably improve roting delay and place and rote time. The time area prodct shows that designs with 8 bit nits are generally most efficient. 52 bit 768 bit 024 bit C T C T C T CLBs [ms] CLBs [ms] CLBs [ms] Table 2. CLB sage and exection time for a fll modlar exponentiation Table 2 shows the application of or reslts to pblic key schemes where the Chinese remainder theorem cannot be applied. A fll modlar exponentiation with an n bit exponent is compted in 2èn + 2èèn +4èclock cycles. 6.2 Application to RSA Table 3 shows or reslts from the tables above, applied to RSA. The encryption time is calclated for the F 4 exponent, reqiring 2 æ 9èn +4èclock cycles. Using the F 4 exponent, only one mltiplication can be calclated in parallel to a sqaring. 52 bit 024 bit C T C T CLBs [ms] CLBs [ms] Table 3. Application to RSA: Encryption For decryption we apply the Chinese remainder theorem. We either decrypt n bits with an n=2 bit architectre serially, or with two n=2 bit architectres in parallel. The first approach ses only half as many resorces, the later is twice as fast. 52 bit 52 bit 024 bit 024 bit 2 æ 256 serial 2 æ 256 parallel 2 æ 52 serial 2 æ 52 parallel C T C T C T C T CLBs [ms] CLBs [ms] CLBs [ms] CLBs [ms] Table 4. Application to RSA: Decryption 6.3 Comparison and Otlook We compare or fastest RSA 52/024 bit designs of Table 4 to the fastest soft and hardware soltions we fond in the literatre [7, 3, 2]. Or 2.37ms decryption time is abot for times faster than the 52 bit software implementation (9.ms) on a 50MHz Alpha [3]. The fastest 024 bit software implementation [2] of 43.3ms rnning on a PPro 200 based PC is abot 4 times slower than or best reslt (0.2ms). The fastest reported hardware design [7] (.7ms for a 52 bit modls and 5.2ms for a 970 bit modls) is a factor.4/.7 faster than ors (9.ms for a 970 bit modls). A drawback of the soltion in [7] is, however, that the binary representation of the modls is hardwired into the logic representation so that the architectre has to be reconfigred with every new modls. The ser of sch an implementation needs to own the fll development tools for synthesis, placing and roting of FPGAs, if RSA with different modli shold be exected. Or design stores the modls, the exponent and the pre comptation factor in registers and RAM. A second advantage of or design is that it is implemented into one device instead of a matrix of 6 devices. Using crrently available FPGA technology, however, the design [7] wold probably also fit in a single device. 7
8 To improve or design in terms of speed, three approaches can be taken:. Comptation of one bit per processing nit (25% improvement estimated). 2. Montgomery mltiplication with a radix r =2 ; ç 2. Comptation of a fll modlar exponentiation in Oèn 2 =è cycles instead of Oèn 2 è. Both approaches have the major disadvantage that considerably more resorces will be sed. We will concentrate or ftre research on trying to implement a higher radix design according to approach 3). The challenge at hand is to accommodate simplifications as proposed in [6] to systolic array and FPGA technology. References [] J. Bajard, L. Didier, and P. Kornerp. An RNS Montgomery modlar mltiplication algorithm. IEEE Transactions on Compters, 47(7):766 76, Jly 998. [2] T. Beth and D. Gollmann. Algorithm engineering for pblic key algorithms. IEEE Jornal on Selected Areas in Commnications, 7(4):458 65, May 989. [3] E. Brickell. A srvey of hardware implementations of RSA. In Advances in Cryptology CRYPTO 89, pages SpringerVerlag, 990. [4] S. E. Eldridge and C. D. Walter. Hardware implementation of Montgomery s modlar mltiplication algorithm. IEEE Transactions on Compters, 42(6): , Jly 993. [5] W. Gai and H. Chen. A systolic linear array for modlar mltiplication. In 2nd International Conference on ASIC, pages 7 4, 996. [6] H.Orp. Simplifying qotient determination in highradix modlar mltiplication. In Proceedings 2th Symposim on Compter Arithmetic, pages 93 9, 995. [7] K. Iwamra, T. Matsmoto, and H. Imai. Montgomery modlarmltiplication method and systolic arrays sitable for modlar exponentiation. Electronics and Commnications in Japan, Part 3, 77(3):40 5, March 994. [8] D. Knth. The Art of Compter Programming. Volme 2: Seminmerical Algorithms. AddisonWesley, Reading, Massachsetts, 2nd edition, 98. [9] P. Kornerp. A systolic, lineararray mltiplier for a class of rightshift algorithms. IEEE Transactions on Compters, 43(8):892 8, Agst 994. [0] P. Montgomery. Modlar mltiplication withot trial division. Mathematics of Comptation, 44(70):59 2, April 985. [] J. Qisqater and C. Covrer. Fast decipherment algorithm for RSA pblic key cryptosystem. Electronics Letters, 8:905 7, October 982. [2] R. Rivest, A. Shamir, and L. Adleman. A method for obtaining digital signatres and pblic key cryptosystems. Commnications of the ACM, 2(2):20 6, Feb [3] M. Shand and J. Villemin. Fast implementations of RSA cryptography. In Proceedings th IEEE Symposim on Compter Arithmetic, pages , 993. [4] D. R. Stinson. Cryptography, Theory and Practice. CRC Press, 995. [5] N. Takagi. A radix4 modlar mltiplication hardware algorithm efficient for iterative modlar mltiplications. In Proceedings 0th IEEE Symposim on Compter Arithmetic, pages 35 42, 99. [6] A. Tiontchik. Systolic modlar exponentiation via Montgomery algorithm. Electronic Letters, 34(9):874 5, April 998. [7] J. Villemin, P. Bertin, D. Roncin, M. Shand, H. Toati, and P. Bocard. Programmable active memories: Reconfigrable systems come of age. IEEE Transactions on VLSI Systems, 4():56 69, Mar 996. [8] C. Walter. Fast modlar mltiplication sing 2power radix. International Jornal of Compter Mathematics, 39( 2):2 8, 99. [9] C. Walter. Systolic modlar mltiplication. IEEE Transactions on Compters, 42(3):376 8, March 993. [20] P. Wang. New VLSI architectres of RSA pblic key cryptosystems. In Proceedings of 997 IEEE International Symposim on Circits and Systems, volme 3, pages , 997. [2] E. D. Win, S. Mister, B. Preneel, and M. Wiener. On the performance of signatre schemes based on elliptic crves. In Algorithmic Nmber Theory Symposim III, pages SpringerVerlag, 998. [22] Xilinx Inc., San Jose, CA. The Programmable Logic Data Book [23] J. YongYin and W. Brleson. VLSI array algorithms and architectres for RSA modlar mltiplication. IEEE Transactions on VLSI Systems, 5(2):2 7, Jne
MINIMED 640G SYSTEM^ Getting Started. WITH THE MiniMed 640G INSULIN PUMP
MINIMED 640G SYSTEM^ Getting Started WITH THE MiniMed 640G INSULIN PUMP let s get started! Table of Contents Section 1: Getting Started... 3 Getting Started with the MiniMed 640G Inslin Pmp...3 1.1 Pmp
More informationObjectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath
Objectives Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath In the previous chapters we have studied how to develop a specification from a given application, and
More informationPipelining. Improve performance by increasing instruction throughput Program execution order. Data access. Instruction. fetch. Data access.
Chapter 6 Pipelining Improve performance by increasing instrction throghpt Program eection order Time (in instrctions) lw $, ($) Instrction fetch 2 4 6 8 2 4 6 8 ALU Data access lw $2, 2($) 8 ns Instrction
More informationHigh Performance Carry Chains for FPGAs
High Performance Carry Chains for FPGAs Matthew M. Hosler Department of Electrical and Computer Engineering Northwestern University Abstract Carry chains are an important consideration for most computations,
More informationA Fast Constant Coefficient Multiplier for the XC6200
A Fast Constant Coefficient Multiplier for the XC6200 Tom Kean, Bernie New and Bob Slous Xilinx Inc. Abstract. We discuss the design of a high performance constant coefficient multiplier on the Xilinx
More informationExamples of FPLD Families: Actel ACT, Xilinx LCA, Altera MAX 5000 & 7000
Examples of FPL Families: Actel ACT, Xilinx LCA, Altera AX 5 & 7 Actel ACT Family ffl The Actel ACT family employs multiplexerbased logic cells. ffl A rowbased architecture is used in which the logic
More informationOF AN ADVANCED LUT METHODOLOGY BASED FIR FILTER DESIGN PROCESS
IMPLEMENTATION OF AN ADVANCED LUT METHODOLOGY BASED FIR FILTER DESIGN PROCESS 1 G. Sowmya Bala 2 A. Rama Krishna 1 PG student, Dept. of ECM. K.L.University, Vaddeswaram, A.P, India, 2 Assistant Professor,
More informationCDA 4253 FPGA System Design FPGA Architectures. Hao Zheng Dept of Comp Sci & Eng U of South Florida
CDA 4253 FPGA System Design FPGA Architectures Hao Zheng Dept of Comp Sci & Eng U of South Florida FPGAs Generic Architecture Also include common fixed logic blocks for higher performance: Onchip mem.
More informationINTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
Tarannum Pathan,, 2013; Volume 1(8):655662 INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK VLSI IMPLEMENTATION OF 8, 16 AND 32
More information770pp. THEORIA 64 (2009)
DOV M. GABBAY AND JOHN WOODS: The Rise of Modern Logic: From Leibniz to Frege. [Handbook of the History of Logic, vol. 3]. Elsevier North Holland, Amsterdam, 2004, 770pp. This volme contains essays on
More informationLaboratory 1  Introduction to Digital Electronics and Lab Equipment (Logic Analyzers, Digital Oscilloscope, and FPGAbased Labkit)
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.  Introductory Digital Systems Laboratory (Spring 006) Laboratory  Introduction to Digital Electronics
More informationOptimization of memory based multiplication for LUT
Optimization of memory based multiplication for LUT V. Hari Krishna *, N.C Pant ** * Guru Nanak Institute of Technology, E.C.E Dept., Hyderabad, India ** Guru Nanak Institute of Technology, Prof & Head,
More informationWINTER 15 EXAMINATION Model Answer
Important Instructions to examiners: 1) The answers should be examined by key words and not as wordtoword as given in the model answer scheme. 2) The model answer and the answer written by candidate
More informationModeling Digital Systems with Verilog
Modeling Digital Systems with Verilog Prof. ChienNan Liu TEL: 034227151 ext:34534 Email: jimmy@ee.ncu.edu.tw 61 Composition of Digital Systems Most digital systems can be partitioned into two types
More informationComputer Architecture and Organization
A1 Appendix A  Digital Logic Computer Architecture and Organization Miles Murdocca and Vincent Heuring Appendix A Digital Logic A2 Appendix A  Digital Logic Chapter Contents A.1 Introduction A.2 Combinational
More informationMIC Series IP Power Supply
Video MIC Series IP Power Spply MIC Series IP Power Spply www.boschsecrity.com MIC power spply with IVAenabled, integrated Bosch IP technology provides video and control over IP for MIC550, MIC550IR,
More informationDesign of Memory Based Implementation Using LUT Multiplier
Design of Memory Based Implementation Using LUT Multiplier Charan Kumar.k 1, S. Vikrama Narasimha Reddy 2, Neelima Koppala 3 1,2 M.Tech(VLSI) Student, 3 Assistant Professor, ECE Department, Sree Vidyanikethan
More informationMidterm Exam 15 points total. March 28, 2011
Midterm Exam 15 points total March 28, 2011 Part I Analytical Problems 1. (1.5 points) A. Convert to decimal, compare, and arrange in ascending order the following numbers encoded using various binary
More informationModified Reconfigurable Fir Filter Design Using Look up Table
Modified Reconfigurable Fir Filter Design Using Look up Table R. Dhayabarani, Assistant Professor. M. Poovitha, PG scholar, V.S.B Engineering College, Karur, Tamil Nadu. Abstract  Memory based structures
More informationAn optimized implementation of 128 bit carry select adder using binary to excessone converter for delay reduction and area efficiency
Journal From the SelectedWorks of Journal December, 2014 An optimized implementation of 128 bit carry select adder using binary to excessone converter for delay reduction and area efficiency P. Manga
More informationTestability: Lecture 23 Design for Testability (DFT) Slide 1 of 43
Testability: Lecture 23 Design for Testability (DFT) Shaahin hi Hessabi Department of Computer Engineering Sharif University of Technology Adapted, with modifications, from lecture notes prepared p by
More informationCHAPTER 4 RESULTS & DISCUSSION
CHAPTER 4 RESULTS & DISCUSSION 3.2 Introduction This project aims to prove that Modified BaughWooley Two s Complement Signed Multiplier is one of the high speed multipliers. The schematic of the multiplier
More informationChapter 4. Logic Design
Chapter 4 Logic Design 4.1 Introduction. In previous Chapter we studied gates and combinational circuits, which made by gates (AND, OR, NOT etc.). That can be represented by circuit diagram, truth table
More informationEfficient Method for LookUpTable Design in Memory Based Fir Filters
International Journal of Computer Applications (975 8887) Volume 78 No.6, September Efficient Method for LookUpTable Design in Memory Based Fir Filters Md.Zameeruddin M.Tech, DECS, Dept. of ECE, Vardhaman
More informationUNIT 1: DIGITAL LOGICAL CIRCUITS What is Digital Computer? OR Explain the block diagram of digital computers.
UNIT 1: DIGITAL LOGICAL CIRCUITS What is Digital Computer? OR Explain the block diagram of digital computers. Digital computer is a digital system that performs various computational tasks. The word DIGITAL
More informationSharif University of Technology. SoC: Introduction
SoC Design Lecture 1: Introduction Shaahin Hessabi Department of Computer Engineering SystemonChip System: a set of related parts that act as a whole to achieve a given goal. A system is a set of interacting
More informationAn Efficient 64Bit Carry Select Adder With Less Delay And Reduced Area Application
An Efficient 64Bit Carry Select Adder With Less Delay And Reduced Area Application K Allipeera, M.Tech Student & S Ahmed Basha, Assitant Professor Department of Electronics & Communication Engineering
More informationInternational Journal of Engineering Trends and Technology (IJETT)  Volume4 Issue8 August 2013
International Journal of Engineering Trends and Technology (IJETT)  Volume4 Issue8 August 2013 Design and Implementation of an Enhanced LUT System in Security Based Computation dama.dhanalakshmi 1, K.Annapurna
More informationA Low Power Delay Buffer Using Gated Driver Tree
IOSR Journal of VLSI and Signal Processing (IOSRJVSP) ISSN: 2319 4200, ISBN No. : 2319 4197 Volume 1, Issue 4 (Nov.  Dec. 2012), PP 2630 A Low Power Delay Buffer Using Gated Driver Tree Kokkilagadda
More informationLecture 23 Design for Testability (DFT): FullScan
Lecture 23 Design for Testability (DFT): FullScan (Lecture 19alt in the Alternative Sequence) Definition Adhoc methods Scan design Design rules Scan register Scan flipflops Scan test sequences Overheads
More informationEXPERIMENT: 1. Graphic Symbol: OR: The output of OR gate is true when one of the inputs A and B or both the inputs are true.
EXPERIMENT: 1 DATE: VERIFICATION OF BASIC LOGIC GATES AIM: To verify the truth tables of Basic Logic Gates NOT, OR, AND, NAND, NOR, ExOR and ExNOR. APPARATUS: mention the required IC numbers, Connecting
More informationDesign of BIST with Low Power Test Pattern Generator
IOSR Journal of VLSI and Signal Processing (IOSRJVSP) Volume 4, Issue 5, Ver. II (SepOct. 2014), PP 3039 eissn: 2319 4200, pissn No. : 2319 4197 Design of BIST with Low Power Test Pattern Generator
More informationCHAPTER 4: Logic Circuits
CHAPTER 4: Logic Circuits II. Sequential Circuits Combinational circuits o The outputs depend only on the current input values o It uses only logic gates, decoders, multiplexers, ALUs Sequential circuits
More informationScan. This is a sample of the first 15 pages of the Scan chapter.
Scan This is a sample of the first 15 pages of the Scan chapter. Note: The book is NOT Pinted in color. Objectives: This section provides: An overview of Scan An introduction to Test Sequences and Test
More informationIntroduction Actel Logic Modules Xilinx LCA Altera FLEX, Altera MAX Power Dissipation
Outline CPE 528: Session #12 Department of Electrical and Computer Engineering University of Alabama in Huntsville Introduction Actel Logic Modules Xilinx LCA Altera FLEX, Altera MAX Power Dissipation
More informationEECS 140 Laboratory Exercise 7 PLD Programming
1. Objectives EECS 140 Laboratory Exercise 7 PLD Programming A. Become familiar with the capabilities of Programmable Logic Devices (PLDs) B. Implement a simple combinational logic circuit using a PLD.
More informationUltralightweight 8bit Multiplicative Inverse Based Sbox Using LFSR
Ultralightweight bit Multiplicative Inverse Based Sbox Using LFSR Sourav Das AlcatelLucent India Ltd Email:sourav10101976@gmail.com Abstract. Most of the lightweight block ciphers are nibbleoriented
More informationN.S.N College of Engineering and Technology, Karur
Modified Reconfigurable CSD Fir Filter Design Using Look up Table Sivakumar.M 1, Ranjitha.S 2, Vijayabharathi.P 3, Dhivya.G 4 1 Assistant professor, 2,3,4 UG studentfinal year, Department of Electronics
More informationOptimization of FPGA Architecture for Uniform Random Number Generator Using LUTSR Family
Optimization of FPGA Architecture for Uniform Random Number Generator Using LUTSR Family Rita Rawate 1, M. V. Vyawahare 2 1 Nagpur University, Priyadarshini College of Engineering, Nagpur 2 Professor,
More informationHardware Modeling of Binary Coded Decimal Adder in Field Programmable Gate Array
American Journal of Applied Sciences 10 (5): 466477, 2013 ISSN: 15469239 2013 M.I. Ibrahimy et al., This open access article is distributed under a Creative Commons Attribution (CCBY) 3.0 license doi:10.3844/ajassp.2013.466.477
More informationThe Design of Efficient Viterbi Decoder and Realization by FPGA
Modern Applied Science; Vol. 6, No. 11; 212 ISSN 19131844 EISSN 19131852 Published by Canadian Center of Science and Education The Design of Efficient Viterbi Decoder and Realization by FPGA Liu Yanyan
More informationA Pseudorandom Binary Generator Based on Chaotic Linear Feedback Shift Register
A Pseudorandom Binary Generator Based on Chaotic Linear Feedback Shift Register Saad Muhi Falih Department of Computer Technical Engineering Islamic University College Al Najaf al Ashraf, Iraq saadmuheyfalh@gmail.com
More informationHardware Implementation of Viterbi Decoder for Wireless Applications
Hardware Implementation of Viterbi Decoder for Wireless Applications Bhupendra Singh 1, Sanjeev Agarwal 2 and Tarun Varma 3 Deptt. of Electronics and Communication Engineering, 1 Amity School of Engineering
More informationIn 2007, Pew Research conducted a survey to assess Americans knowledge of
CHAPTER 12 Sample Srveys In 2007, Pew Research condcted a srvey to assess Americans knowledge of crrent events. They asked a random sample of 1,502 U.S. adlts 23 factal qestions abot topics crrently in
More informationLossless Compression Algorithms for Direct Write Lithography Systems
Lossless Compression Algorithms for Direct Write Lithography Systems HsinI Liu Video and Image Processing Lab Department of Electrical Engineering and Computer Science University of California at Berkeley
More informationImplementation of CRC and Viterbi algorithm on FPGA
Implementation of CRC and Viterbi algorithm on FPGA S. V. Viraktamath 1, Akshata Kotihal 2, Girish V. Attimarad 3 1 Faculty, 2 Student, Dept of ECE, SDMCET, Dharwad, 3 HOD Department of E&CE, Dayanand
More information2.6 Reset Design Strategy
2.6 Reset esign Strategy Many design issues must be considered before choosing a reset strategy for an ASIC design, such as whether to use synchronous or asynchronous resets, will every flipflop receive
More informationDesign and FPGA Implementation of 100Gbit/s Scrambler Architectures for OTN Protocol Chethan Kumar M 1, Praveen Kumar Y G 2, Dr. M. Z. Kurian 3.
International Journal of Computer Engineering and Applications, Volume VI, Issue II, May 14 www.ijcea.com ISSN 2321 3469 Design and FPGA Implementation of 100Gbit/s Scrambler Architectures for OTN Protocol
More informationA VLSI Architecture for Variable Block Size Video Motion Estimation
A VLSI Architecture for Variable Block Size Video Motion Estimation Yap, S. Y., & McCanny, J. (2004). A VLSI Architecture for Variable Block Size Video Motion Estimation. IEEE Transactions on Circuits
More informationCombinational / Sequential Logic
Digital Circuit Design and Language Combinational / Sequential Logic Chang, Ik Joon Kyunghee University Combinational Logic + The outputs are determined by the present inputs + Consist of input/output
More informationA Symmetric Differential Clock Generator for BitSerial Hardware
A Symmetric Differential Clock Generator for BitSerial Hardware Mitchell J. Myjak and José G. DelgadoFrias School of Electrical Engineering and Computer Science Washington State University Pullman, WA,
More informationK.T. Tim Cheng 07_dft, v Testability
K.T. Tim Cheng 07_dft, v1.0 1 Testability Is concept that deals with costs associated with testing. Increase testability of a circuit Some test cost is being reduced Test application time Test generation
More informationChapter 5: Synchronous Sequential Logic
Chapter 5: Synchronous Sequential Logic NCNU_2016_DD_5_1 Digital systems may contain memory for storing information. Combinational circuits contains no memory elements the outputs depends only on the inputs
More informationDigilent Nexys3 Cellular RAM Controller Reference Design Overview
Digilent Nexys3 Cellular RAM Controller Reference Design Overview General Overview This document describes a reference design of the Cellular RAM (or PSRAM Pseudo Static RAM) controller for the Digilent
More informationDIVAR network 2000 recorder
Video DIVAR network 2000 recorder DIVAR network 2000 recorder www.boschsecrity.com APP H.265 16 IP channels with 256 Mbps incoming bandwidth 8 MP (UHD) IP camera spport for view and playback Real time
More informationMUHAMMAD NAEEM LATIF MCS 3 RD SEMESTER KHANEWAL
1. A stage in a shift register consists of (a) a latch (b) a flipflop (c) a byte of storage (d) from bits of storage 2. To serially shift a byte of data into a shift register, there must be (a) one click
More information140 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 12, NO. 2, FEBRUARY 2004
140 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 12, NO. 2, FEBRUARY 2004 Leakage Current Reduction in CMOS VLSI Circuits by Input Vector Control Afshin Abdollahi, Farzan Fallah,
More informationEfficient Architecture for Flexible Prescaler Using Multimodulo Prescaler
Efficient Architecture for Flexible Using Multimodulo G SWETHA, S YUVARAJ Abstract This paper, An Efficient Architecture for Flexible Using Multimodulo is an architecture which is designed from the proposed
More informationCESR BPM System Calibration
CESR BPM System Calibration Joseph Burrell Mechanical Engineering, WSU, Detroit, MI, 48202 (Dated: August 11, 2006) The Cornell Electron Storage Ring(CESR) uses beam position monitors (BPM) to determine
More informationLayout Decompression Chip for Maskless Lithography
Layout Decompression Chip for Maskless Lithography Borivoje Nikolić, Ben Wild, Vito Dai, Yashesh Shroff, Benjamin Warlick, Avideh Zakhor, William G. Oldham Department of Electrical Engineering and Computer
More informationDIGITAL FUNDAMENTALS
DIGITAL FUNDAMENTALS A SYSTEMS APPROACH THOMAS L. FLOYD PEARSON Boston Columbus Indianapolis New York San Francisco Upper Saddle River Amsterdam Cape Town Dubai London Madrid Milan Munich Paris Montreal
More informationDIGITAL SYSTEM DESIGN UNIT I (2 MARKS)
DIGITAL SYSTEM DESIGN UNIT I (2 MARKS) 1. Convert Binary number (111101100) 2 to Octal equivalent. 2. Convert Binary (1101100010011011) 2 to Hexadecimal equivalent. 3. Simplify the following Boolean function
More informationHardware Implementation of Block GC3 Lossless Compression Algorithm for DirectWrite Lithography Systems
Hardware Implementation of Block GC3 Lossless Compression Algorithm for DirectWrite Lithography Systems HsinI Liu, Brian Richards, Avideh Zakhor, and Borivoje Nikolic Dept. of Electrical Engineering
More informationDesign and Implementation of SOC VGA Controller Using Spartan3E FPGA
Design and Implementation of SOC VGA Controller Using Spartan3E FPGA 1 ARJUNA RAO UDATHA, 2 B.SUDHAKARA RAO, 3 SUDHAKAR.B. 1 Dept of ECE, PG Scholar, 2 Dept of ECE, Associate Professor, 3 Electronics,
More informationBITSTREAM COMPRESSION TECHNIQUES FOR VIRTEX 4 FPGAS
BITSTREAM COMPRESSION TECHNIQUES FOR VIRTEX 4 FPGAS Radu Ştefan, Sorin D. Coţofană Computer Engineering Laboratory, Delft University of Technology Mekelweg 4, 2628 CD Delft, The Netherlands email: R.A.Stefan@tudelft.nl,
More informationA Reed Solomon ProductCode (RSPC) Decoder Chip for DVD Applications
IEEE JOURNAL OF SOLIDSTATE CIRCUITS, VOL. 36, NO. 2, FEBRUARY 2001 229 A Reed Solomon ProductCode (RSPC) Decoder Chip DVD Applications HsieChia Chang, C. Bernard Shung, Member, IEEE, and ChenYi Lee
More informationLaboratory Exercise 7
Laboratory Exercise 7 Finite State Machines This is an exercise in using finite state machines. Part I We wish to implement a finite state machine (FSM) that recognizes two specific sequences of applied
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Sciences
MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Sciences Introductory Digital Systems Lab (6.111) Quiz #2  Spring 2003 Prof. Anantha Chandrakasan and Prof. Don
More informationEXPERIMENT 13 ITERATIVE CIRCUITS
EE 2449 Experiment 13 Revised 4/17/2017 CALIFORNIA STATE UNIVERSITY LOS ANGELES Department of Electrical and Computer Engineering EE246 Digital Logic Lab EXPERIMENT 13 ITERATIVE CIRCUITS Text: Mano, Digital
More informationThe basic logic gates are the inverter (or NOT gate), the AND gate, the OR gate and the exclusiveor gate (XOR). If you put an inverter in front of
1 The basic logic gates are the inverter (or NOT gate), the AND gate, the OR gate and the exclusiveor gate (XOR). If you put an inverter in front of the AND gate, you get the NAND gate etc. 2 One of the
More information1. Convert the decimal number to binary, octal, and hexadecimal.
1. Convert the decimal number 435.64 to binary, octal, and hexadecimal. 2. Part A. Convert the circuit below into NAND gates. Insert or remove inverters as necessary. Part B. What is the propagation delay
More informationFigure 1: segment of an unprogrammed and programmed PAL.
PROGRAMMABLE ARRAY LOGIC The PAL device is a special case of PLA which has a programmable AND array and a fixed OR array. The basic structure of Rom is same as PLA. It is cheap compared to PLA as only
More informationSynchronous Sequential Design
Synchronous Sequential Design SMD098 Computation Structures Lecture 4 1 Synchronous sequential systems Almost all digital systems have some concept of state the outputs of a system depends on the past
More informationDesign Project: Designing a Viterbi Decoder (PART I)
Digital Integrated Circuits A Design Perspective 2/e Jan M. Rabaey, Anantha Chandrakasan, Borivoje Nikolić Chapters 6 and 11 Design Project: Designing a Viterbi Decoder (PART I) 1. Designing a Viterbi
More informationReport on 4bit Counter design Report 1, 2. Report on D Flipflop. Course project for ECE533
Report on 4bit Counter design Report 1, 2. Report on D Flipflop Course project for ECE533 I. Objective: REPORTI The objective of this project is to design a 4bit counter and implement it into a chip
More informationDual Edge Adaptive Pulse Triggered FlipFlop for a High Speed and Low Power Applications
International Journal of Scientific and Research Publications, Volume 5, Issue 10, October 2015 1 Dual Edge Adaptive Pulse Triggered FlipFlop for a High Speed and Low Power Applications S. Harish*, Dr.
More informationAltera s Max+plus II Tutorial
Altera s Max+plus II Tutorial Written by Kris Schindler To accompany Digital Principles and Design (by Donald D. Givone) 8/30/02 1 About Max+plus II Altera s Max+plus II is a powerful simulation package
More informationChapter 8 Functions of Combinational Logic
ETEC 23 Programmable Logic Devices Chapter 8 Functions of Combinational Logic Shawnee State University Department of Industrial and Engineering Technologies Copyright 27 by Janna B. Gallaher Basic Adders
More informationDesign and Implementation of Signal Processing Systems: An Introduction
Design and Implementation of Signal Processing Systems: An Introduction Outline Course Objectives and Outline, Conduct What is signal processing? Implementation Options and Design issues: General purpose
More informationA New Proposed Design of a Stream Cipher Algorithm: Modified Grain  128
International Journal of Computer and Information Technology (ISSN: 2279 764) Volume 3 Issue 5, September 214 A New Proposed Design of a Stream Cipher Algorithm: Modified Grain  128 Norul Hidayah Lot
More informationProgrammable Logic Design I
Programmable Logic Design I Introduction In labs 11 and 12 you built simple logic circuits on breadboards using TTL logic circuits on 7400 series chips. This process is simple and easy for small circuits.
More informationResearch Article Ring Counter Based ATPG for Low Transition Test Pattern Generation
e Scientific World Journal Volume 205, Article ID 72965, 6 pages http://dx.doi.org/0.55/205/72965 Research Article Ring Counter Based ATPG for Low Transition Test Pattern Generation V. M. Thoulath Begam
More informationUpgrading a FIR Compiler v3.1.x Design to v3.2.x
Upgrading a FIR Compiler v3.1.x Design to v3.2.x May 2005, ver. 1.0 Application Note 387 Introduction This application note is intended for designers who have an FPGA design that uses the Altera FIR Compiler
More informationTraffic Light Controller
Traffic Light Controller Four Way Intersection Traffic Light System Fall2017 James Todd, Thierno Barry, Andrew Tamer, Gurashish Grewal Electrical and Computer Engineering Department School of Engineering
More informationDecade Counters Mod5 counter: Decade Counter:
Decade Counters We can design a decade counter using cascade of mod5 and mod2 counters. Mod2 counter is just a single flipflop with the two stable states as 0 and 1. Mod5 counter: A typical mod5
More informationMVision Laser 18K Series High Brightness Digital Video Projector
MVision Laser 18K Series High Brightness Digital Video Projector 4INSTALLATION AND QUICKSTART GUIDE 4CONNECTION GUIDE 4OPERATING GUIDE 4REFERENCE GUIDE 118056C Abot This Docment Follow the instrctions
More informationECE532 Digital System Design Title: Stereoscopic Depth Detection Using Two Cameras. Final Design Report
ECE532 Digital System Design Title: Stereoscopic Depth Detection Using Two Cameras Group #4 Prof: Chow, Paul Student 1: Robert An Student 2: Kai Chun Chou Student 3: Mark Sikora April 10 th, 2015 Final
More informationThe Stratix II Logic and Routing Architecture
The Stratix II Logic and Routing Architecture David Lewis*, Elias Ahmed*, Gregg Baeckler, Vaughn Betz*, Mark Bourgeault*, David Cashman*, David Galloway*, Mike Hutton, Chris Lane, Andy Lee, Paul Leventis*,
More informationFrom Theory to Practice: Private Circuit and Its Ambush
Indian Institute of Technology Kharagpur Telecom ParisTech From Theory to Practice: Private Circuit and Its Ambush Debapriya Basu Roy, Shivam Bhasin, Sylvain Guilley, JeanLuc Danger and Debdeep Mukhopadhyay
More informationDigital Systems Laboratory 1 IE5 / WS 2001
Digital Systems Laboratory 1 IE5 / WS 2001 university of applied sciences fachhochschule hamburg FACHBEREICH ELEKTROTECHNIK UND INFORMATIK digital and microprocessor systems laboratory In this course you
More informationA Parallel Area Delay Efficient Interpolation Filter Architecture
A Parallel Area Delay Efficient Interpolation Filter Architecture [1] Anusha Ajayan, [2] Rafeekha M J [1] PG Student [VLSI & ES] [2] Assistant professor, Department of ECE, TKM Institute of Technology,
More informationA HighResolution Flash TimetoDigital Converter Taking Into Account Process Variability. Nikolaos Minas David Kinniment Keith Heron Gordon Russell
A HighResolution Flash TimetoDigital Converter Taking Into Account Process Variability Nikolaos Minas David Kinniment Keith Heron Gordon Russell Outline of Presentation Introduction Background in TimetoDigital
More informationIntegrated Circuit Design ELCT 701 (Winter 2017) Lecture 1: Introduction
1 Integrated Circuit Design ELCT 701 (Winter 2017) Lecture 1: Introduction Assistant Professor Office: C3.315 Email: eman.azab@guc.edu.eg 2 Course Overview Lecturer Teaching Assistant Course Team Email:
More informationDesign of Test Circuits for Maximum Fault Coverage by Using Different Techniques
Design of Test Circuits for Maximum Fault Coverage by Using Different Techniques Akkala Suvarna Ratna M.Tech (VLSI & ES), Department of ECE, Sri Vani School of Engineering, Vijayawada. Abstract: A new
More informationRegisters. Unit 12 Registers and Counters. Registers (D FlipFlop based) Register Transfers (example not out of text) Accumulator Registers
Unit 2 Registers and Counters Fundamentals of Logic esign EE2369 Prof. Eric Maconald Fall Semester 23 Registers Groups of flipflops Can contain data format can be unsigned, 2 s complement and other more
More informationBISTBased Diagnostics of FPGA Logic Blocks
To appear in Proc. International Test Conf., Nov. 1997 BISTBased Diagnostics of FPGA Logic Blocks Charles Stroud, Eric Lee, Dept. of Electrical Engineering University of Kentucky and Miron Abramovici
More informationTutorial 11 ChipscopePro, ISE 10.1 and Xilinx Simulator on the Digilent Spartan3E board
Tutorial 11 ChipscopePro, ISE 10.1 and Xilinx Simulator on the Digilent Spartan3E board Introduction This lab will be an introduction on how to use ChipScope for the verification of the designs done on
More informationLow Power VLSI CMOS Design An Image Processing Chip for RGB to HSI Conversion
Low Power VLSI CMOS Design An Image Processing Chip for RGB to HSI Conversion A.Th. Schwarzbacher 1,2 and J.B. Foley 2 1 Dublin Institute of Technology, Dept. Of Electronic and Communication Eng., Dublin,
More informationBlock Diagram. 16/24/32 etc. pixin pixin_sof pixin_val. Supports 300 MHz+ operation on basic FPGA devices 2 Memory Read/Write Arbiter SYSTEM SIGNALS
Key Design Features Block Diagram Synthesizable, technology independent IP Core for FPGA, ASIC or SoC Supplied as human readable VHDL (or Verilog) source code Output supports full flow control permitting
More informationCascadable 4Bit Comparator
EE 415 Project Report for Cascadable 4Bit Comparator By William Dixon Mailbox 509 June 1, 2010 INTRODUCTION... 3 THE CASCADABLE 4BIT COMPARATOR... 4 CONCEPT OF OPERATION... 4 LIMITATIONS... 5 POSSIBILITIES
More information