A Compact 3-D VLSI Classifier Using Bagging Threshold Network Ensembles

Size: px
Start display at page:

Download "A Compact 3-D VLSI Classifier Using Bagging Threshold Network Ensembles"

Transcription

1 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 14, NO. 5, SEPTEMBER A Compact 3-D VLSI Classifier Using Bagging Threshold Network Ensembles Amine Bermak, Member, IEEE, and Dominique Martinez Abstract A bagging ensemble consists of a set of classifiers trained independently and combined by a majority vote. Such a combination improves generalization performance but can require large amounts of memory and computation, a serious drawback for addressing portable real-time pattern recognition applications. We report here a compact three-dimensional (3-D) multiprecision very large-scale integration (VLSI) implementation of a bagging ensemble. In our circuit, individual classifiers are decision trees implemented as threshold networks-one layer of threshold logic units (TLUs) followed by combinatorial logic functions. The hardware was fabricated using 0.7- m CMOS technology and packaged using MCM-V micro-packaging technology. The 3-D chip implements up to 192 TLUs operating at a speed of up to 48GCPPS and implemented in a volume of ( )= ( ) cm 3. The 3-D circuit features a high level of programmability and flexibility offering the possibility to make an efficient use of the hardware resources in order to reduce the power consumption. Successful operation of the 3-D chip for various precisions and ensemble sizes is demonstrated through an electronic nose application. Index Terms Bagging, decision trees, threshold networks, very large-scale integration (VLSI), three-dimensional (3-D) packaging technology. I. INTRODUCTION COMBINING multiple classifiers (such as neural networks or decision trees) to build an ensemble is an advanced pattern recognition technique which has gained increasing attention within the machine learning community. Bagging and Boosting are two popular methods proposed in order to create accurate ensembles (see compilation of papers at The two methods rely on resampling techniques to obtain different training sets for each of the individual classifiers. The resulting combined classifier is generally more robust and accurate than a single classifier trained on the original dataset. However, ensembles suffer from some shortcomings, as stated by Dietterich [1]: While ensembles provide very accurate classifiers, there are problems that may limit their practical applications. One problem is that ensembles can require large amounts of memory to store and large amounts of computation to apply. Thus, this scheme can be put to efficient practical use only if good hardware implementation strategies are developed. In this paper, we describe a proof-of-concept compact threedimentional (3-D) chip that we believe can meet the computa- Manuscript received September 15, A. Bermak is with the Electrical and Electronics Engineering Department, Hong Kong University of Science and Technology, Kowloon, Hong Kong. D. Martinez is with LORIA, Vandoeuvre-Les-Nancy 54506, France. Digital Object Identifier /TNN tional requirement of bagging ensembles. In our chip, individual classifiers are decision trees implemented as threshold networks (binary neural networks having a layer of threshold logic units (TLUs) followed by combinatorial logic elements). The prototype combines silicon very large-scale integration (VLSI)-based circuits with compact 3-D packaging technology whereby the computational power is increased by stacking VLSI chips vertically using micropackaging technology referred to as multichip-module-vertical (MCM-V). Selective gas detection was used as a test-bed for the 3-D chip operating as a compact and low power pattern recognition classifier for electronic nose application. However, novel design features such as multiprecision and hardware reconfigurability were introduced in order to make the 3-D chip a general problem solving system. The 3-D chip can be configured to implement, with a programmable precision, any threshold network topology. To the best of our knowledge this is the very first 3-D VLSI implementation of bagging ensembles. In Section II, performance of bagging decision trees specified as threshold network ensembles are evaluated for different precision requirements. Section III describes the hardware architecture of the basic chip and its main features including the multiprecision computation and reconfigurability concept. Section IV details the VLSI implementation of the basic prototype, the multichip module, and the final 3-D packaged circuit. Section V presents the experimental results and the chip performance operating as a systolic processor as well as a bagging ensemble applied to odor discrimination for electronic nose applications. II. ALGORITHMIC CONSIDERATIONS A. Bagging Decision Trees Bagging [2] is a popular and effective technique for improving classification performance by creating ensembles. Bagging uses random sampling with replacement from the original data set in order to obtain different training sets. Because the size of the sampled data set has the same size as the original one, many of the original examples may be repeated while others may be left out. On average, 63% of the original data appears in the sampled training set [2]. Each individual classifier is built on each training set by applying the same learning algorithm. The resulting classifiers are then combined by a simple majority vote. It is well known that bagging significantly improves classifiers that are unstable in the sense that small perturbations in the training data may result in large changes in the generated classifier [2]. Empirical evaluations have shown that bagging improves decision trees /03$ IEEE

2 1098 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 14, NO. 5, SEPTEMBER 2003 TABLE I DATASETS USED IN THE EXPERIMENTS. THE FIRST THREE ONES ARE TAKEN FROM THE UCI REPOSITORY. SEE SECTION V-D FOR EXPLANATIONS ON HOW THE ODOR DATASET WAS OBTAINED Fig. 1. Equivalence between decision trees and threshold networks. (a) and (c) Two examples of a tree and a threshold network, respectively. (b) and (d) Their respective partition of the input space. Each node of the tree corresponds to a separator hyperplane and the leaves correspond to a given class. Each class can be represented by a logical function that combines a set of nodes. In our example, it can be seen from the tree structure that class = ac + acd, class =a b + acd, and class =ab. Note however that the logical function for class 1 extracted by our program is class = ac + ad which is better optimized. or neural networks [3], [4] but does not improve the k-nearest neighbor algorithm [2]. For the nearest neighbor algorithm, a test case may change classification only if its nearest neighbor in the original dataset is not picked in at least half of the sampled training sets. The probability that this occurs gets very small as the number of classifiers within the ensemble gets larger [2]. A similar reasoning can be applied to support vector machines [5], [6] that are stable classifiers. Whether bagging decision trees are more accurate than bagging neural networks depends on the particular data set but on average they have similar performance and there is no clear evidence to prefer one or the other [3], [4]. However, because decision trees are fast to build with standard procedures (like CART [7], C4.5 [8], or OC1 [9]) and can be interpreted as a series of rules, they are often used in bagging ensembles. B. Decision Trees as Threshold Networks Hardware implementation is made easier if one considers decision trees as threshold networks as evidenced through the example shown in Fig. 1. The decision tree shown in Fig. 1(a) implements a classifier that discriminates between three classes (denoted 1, 2, and 3) shown in Fig. 1(b). Each node in the tree is a TLU implementing a linear discriminant and each leaf is associated to a given class. Classifying an input pattern then reduces to a sequence of binary decisions, starting from the root node and ending when a leaf is reached. Each class can then be represented by a logical function that combines the binary decisions encountered at the nodes. Therefore, a decision tree can, thus, be considered as a threshold network having a hidden layer of TLUs followed by one logical function per class. Note that the architecture of such threshold networks is similar to some binary neural networks that use a single hidden layer of threshold neurons followed by an XOR gate [10], [11] or by a combination of AND and OR gates [12] [15]. Also, the equivalence between decision trees and binary neural networks or threshold networks was first noted by Sethi [16], [17]. Once a decision tree has been constructed, it is a simple matter to convert it into an equivalent threshold network by extracting one logical function per class from the tree structure. A logical function for a given class has a number of conjunctions equal to TABLE II LEAVE-ONE-OUT ACCURACY (IN %) FOR THE DIFFERENT DATASETS. DECISION TREES WERE TRAINED WITH OC1 AND TRANSFORMED INTO EQUIVALENT THRESHOLD NETWORKS. SEE TEXT FOR DETAILS ON THE PROCEDURE USED FOR CREATING INDIVIDUAL AND ENSEMBLE OF THRESHOLD NETWORKS. THE NUMBERS IN BRACKETS INDICATE THE TENFOLD CROSS-VALIDATION ACCURACY TAKEN FROM [3] and [4] FOR SINGLE DECISION TREES OR BAGGING ENSEMBLES OF TEN DECISION TREES TRAINED WITH C4.5. FOR SVM, THE ERROR PENALTY PARAMETER C = 1000, POLYNOMIAL KERNELS HAVE BEEN USED AND THE DEGREES OF THE POLYNOMIAL HAVE BEEN ADJUSTED SEPARATELY TO GET THE BEST PERFORMANCE ON EACH INDIVIDUAL DATASET the number of leaves associated to this class (see Fig. 1(a) and logical expressions reported in figure caption). While it is not possible to reduce this number without loosing the equivalence between the decision tree and the threshold network, it can be possible to simplify the conjunctions themselves. Any node that has a leaf of a given class as children can be removed from the other conjunctions of the same class. For example, node c in Fig. 1(a) has a leaf of class 1. The conjunction associated to this leaf is. It is easy to check that node c can be removed from the other conjunction. To evaluate the performance of bagging decision trees specified as threshold network ensembles, we performed discrimination experiments on the four datasets summarized in Table I. Research works suggested that ensembles with ten members are adequate to improve the classification performance on these datasets [3], [4]. Thus, ensembles of ten decision trees were created by using bagging and transformed to ensembles of ten threshold networks. CART [7], C4.5 [8] and OC1 [9] are perhaps the most popular tree building algorithms. Here OC1 was used because it seems to perform better than the others (smaller and more accurate decision trees) [9]. OC1 is a randomized algorithm that builds oblique decision trees by simulated annealing. The OC1 program available at fttp://ftp.cs.jhu.edu/pub/oc1 was modified to incorporate bagging. Moreover, an additional program was written in C in order to transform each decision tree into an equivalent threshold network by extracting automatically one optimized logical function per class from the tree structure. Because the datasets we used were small, generalization performance were estimated by a leave-one-out procedure. Table II reports the leave-one-out performance of bagging decision trees implemented as threshold network ensembles in comparison to the

3 BERMAK AND MARTINEZ: A COMPACT 3-D VLSI CLASSIFIER 1099 one of single threshold networks and support vector machines (SVMs) [5], [6]. Our SVM program used for the comparisons was written in C and uses a quadratic programming method originating from [18], implemented in [19], and available at For these datasets, threshold network ensembles were always more accurate than single threshold networks, which agrees with previous findings [3], [4]. Moreover, they outperformed SVMs in two datasets over four. C. What Is the Required Precision? Threshold networks require only TLUs and combinatorial logic and are very suitable for a VLSI implementation [20]. The threshold function is indeed easy to implement in digital and this results in significant silicon area saving as compared to sigmoidal or radial basis functions used in multilayer perceptrons or RBF networks and implemented through area consuming lookup tables. This simplification results in very compact arithmetic units, and makes the prospect of building up VLSI chips implementing bagging threshold networks particularly promising for real-time decisions. However, when implementing bagging threshold networks with hardware of limited precision, the sensitivity to weight perturbation may result in performance degradation. It is, therefore, very important to study carefully the precision requirements for the classification problem at hand. Weights and inputs are then coded with the minimum required precision without affecting too much the classification performance. We have evaluated the effect of weight precision on the performance of bagging decision trees for the datasets used above. After training, the weight vectors of each individual threshold network were normalized and uniformly quantized with bits of precision by assigning uniform intervals over the range. Table III reports the performance of threshold network ensembles with weights quantized with 16, 8, and 4 bits. In order to maintain acceptable performance, 16 bits of precision are sufficient for all the datasets. However, the required precision depends on the problem at hand (16 bits for hepatitis against eight bits only for the other datasets). The use of a wordlength larger than the required precision (for example 16 bits for ionosphere ) results in an inefficient usage of the hardware resources (slower processing and higher power consumption). As a consequence, we have chosen to implement threshold network ensembles in hardware with multiprecision. This is greatly beneficial particularly for exploiting the VLSI chip for various problems with different precision requirements. It also permits to exploit efficiently the hardware resources available and, hence, facilitate the implementation of reasonable size bagging ensembles (see Section III). III. HARDWARE CONSIDERATIONS An important issue when implementing reasonable size network ensembles for portable pattern recognition applications is to provide high level of compactness together with low power operation. To meet these challenges, we chose to implement our hardware using a low cost 3-D packaging technology referred to as MCM-V [21]. It was shown in the literature that 3-D packaging technology enhances most aspects of electronic systems TABLE III LEAVE-ONE-OUT ACCURACY (IN %) OF ENSEMBLES OF TEN THRESHOLD NETWORKS WITH RESPECT TO THE WEIGHT PRECISION (IN NUMBER OF BITS) such as size, weight, speed, and yield and reduces power consumption by as much as 30% [22]. In addition, 3-D technologies offer interesting aspects and options for solving the problem of neural network connectivity [23]. Our VLSI design is divided into three different parts: design fabrication and test of 1) the single chip architecture; 2) the multichip-module; and 3) the 3-D packaged system. A. Basic Chip Architecture and Circuit Description The basic building block chip is based on a two-dimensional (2-D) systolic array architecture. This array consists of 4 4 processing elements (PEs) as shown in Fig. 2(a). The array could be configured, using the control signal (cne), to perform either a weighted sum or a TLU unit (Out). Each processor PE includes a local configurable memory to store either one 16-bit weight, two 8-bit weights, or four 4-bit weights. The 16-bit Xi bus is used to feed the inputs serially from the least significant bit to the most significant one with an arbitrary user defined precision. The buses Si and are systolic input output buses used to interface between basic VLSI chips when higher number of PEs is needed. When a processor receives an input it computes the product of the locally stored weight by. A compact low power multiprecision serial parallel multiplier [24] has been used to perform the multiplication within each processor. The output of the multiplier is then added to the partial sum received from the processor located on the left and transmitted to the processor located on the right. Input data are propagated vertically through flip-flops and, therefore, the computation that takes place in row i of the systolic array is repeated at row one clock cycle later. All results are collected on the right side of the array. Wider systolic array (more inputs) can be realized by bypassing the activation function and connecting the partial weighted sum across different chips through flip-flops. This is realized using the output multiplexer controlled by the control bit (cne). The outputs may also be configured using the internal control bit (cne) to perform the threshold activation function and, hence, to realize a TLU unit. A ten-bit internal control register is used to configure the 4 4 array of PEs in terms of threshold network topology (TLU, matrix operation), and weights precision. The activation function is simply realized by detecting the sign bit of the weighted sum which corresponds to the last generated bit from the serial parallel processor (most significant bit in two complement). A sampling circuit is also used in order to detect the sign bit of each TLU and to multiplex the different TLU outputs in time so that only one physical pin is used for the out signal of the entire array. This time multiplexing scheme does not affect the overall speed performance of the system since a systolic architecture is used and data are processed in a pipelined way. For example

4 1100 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 14, NO. 5, SEPTEMBER 2003 Fig. 2. (a) Internal architecture of a basic VLSI chip. (b) PE building block diagram. For NO=Op = 1; S =S (Op), while for NO=Op = 0; s =s (NO). FF stands for a flip-flop. the sign bit for row1 is obtained one clock cycle earlier than the one of row2 and, therefore, only one physical pin is required to sample the two rows data. This has allowed us to reduce the physical number of Pins and facilitate the interfacing of several chips within the MCM and the 3-D chip. Other methods, such as bus sharing technique, were also used in order to reduce further the number of physical PINs and buses within the architecture. Since the system presents different modes of operation, a single bus was assigned different tasks over the different modes of operation. Fig. 2(b) shows the internal block diagram of each PE within the systolic array. It can be noticed from this figure that the 4-bit bus is used to load the 16-bit internal weight register and also to provide the partial sum inputs to either the configurable arithmetic unit or to the 2-inputs/1-output multiplexer, depending on the value of the control bit (NO/OP) stored in the internal flip-flop. When the bus is directly provided to the neighboring PE located on the right and, hence, no processing is performed within the PE; the processor is in a nonoperational (NO) mode. This mode is used in order to obtain three interesting features for the systolic array. Possibility of exhaustive test of individual PE within the array. This is achieved by programming the PE to be tested as OP and all other PEs as NO. Improved fault tolerance system. With the NO mode, it is possible to isolate a defective PE within the systolic array and/or a defective chip within the MCM or the 3-D chip. Efficient loading of the weights using a single bus. Using the NO mode, it is possible to bypass already loaded PEs and provide the bus to unloaded PEs. This reduces the number of I O pads of the VLSI chip since all processors are loaded using a single bus. Moreover, the technique is also used to load the data into cascaded chips using a single bus. The weights loading technique is explained in more details in the next section. B. Weights Loading Technique Fig. 3 describes the timing sequence during the loading phase of the chip. To begin the loading process, the signal lw of Fig. 3. Technique used for loading weights. L corresponds to the loading mode (Internal weight register! s );Rcorresponds to the reset mode (S = 0) and NO corresponds to the not operation mode (S = S ). D stands for a delay block. Fig. 3 must be held high for at least half a cycle. When lw is high, loading process begins by presenting to the bus, the weights of the four PEs forming the left column of the systolic architecture. The lw signal is held by a module for four cycles before being passed to the four neighboring PEs located on the right side. At each clock cycle, four bits are stored in each processor. Since each processor contains a 16-bit weight, the loading process of each column of four processors takes four clock cycles. While holding the lw signal, a PE loads data into its internal register. The PE is then automatically programmed as an NO processor. In this case, the configurable arithmetic unit of the processor is bypassed and the bus is automatically allocated to the adjacent PE located on the right side. The weights within the basic VLSI chips are, therefore, loaded using only one 16-bit bus. Loading proceeds in a similar manner for all columns of the processing element as the lw signal is transmitted to the right as illustrated in Fig. 3. The same bus can be

5 BERMAK AND MARTINEZ: A COMPACT 3-D VLSI CLASSIFIER 1101 used to load the weights of cascaded chips. This is achieved by connecting the signals and of a basic chip to and of the chip. This proposed technique of loading the weights obviates the cumbersome need for a high number of I O in the MCM and the 3-D VLSI circuit. C. Multiprecision Processing In order to implement a multiprecision processing, the 16-bit arithmetic unit was built-up as four four-bit processors wired together using a set of multiplexers. Fig. 4 shows the architecture of the configurable arithmetic unit, in which each row consists of a single four-bit processor. Eight multiplexers within each PE are used to change the hardware connections between two adjacent rows of cells in order to obtain a weight precision of 4-, 8-, or 16-bit. For example, if then the combined weights of each of the four-bit processors are considered as a single 16-bit weight and only the buses sin4 and sout4 of Fig. 3 are enabled. PE1 would process the most significant four-bit of the weight while PE4 would process the least significant ones. If and, then the precision would be set to eight-bit. The remaining control bits such as c3, c4, and c are used in order to configure the number of inputs and outputs of the multiprecision processor. For example, it is possible to have either a precision of eight-bit with two inputs and one TLU or one input and two TLUs. D. Reconfigurability Reconfigurability is defined as the ability of the hardware to be modified in order to fit the topology of the threshold network implemented [25]. The reconfigurability is an issue that needs to be addressed in order to make the relatively expensive hardware solution a general problem solving system. A reconfigurable topology using the architecture is described in Fig. 5. The hardware can be configured to operate at three different configuration of weight precision. The input precision is arbitrary selected by the user and, hence, Xi can take any word-length. As it can be seen from Fig. 5, depending on the selected precision different topologies of threshold networks can be configured. stands for a network topology with a bits of precision, inputs, and TLUs. For a four-bit precision three configurations are possible namely: and. For an eight-bit precision two configurations are possible: and only one configuration is possible for a 16-bit precision:. The available resources of the circuit are a tradeoff between the three parameters ( and ). The configuration with the lowest precision allows to increase the number of inputs or TLUs according to, where is a constant term which depends on the number of chip interfaced. For example, for a single chip while for four cascaded chips. Larger networks (more inputs per TLU) are obtained by bypassing the activation function and connecting the partial weighted sum across different chips. This is achieved using the output multiplexer controlled by the bit cne. Fig. 4. Internal schematic of the configurable multiprecision arithmetic unit. 4bitMul denotes a four-bit serial parallel multiplier in which the weights are stored in parallel while the input are fed serially from the LSB to the MSB. Mux1I2O is a one-input two-outputs multiplexer, while Mux2I1O is a two-inputs one-output multiplexer. IV. VLSI IMPLEMENTATION, MCM DESIGN, AND 3-D PACKAGING A. Modularity and System Expansibility Before explaining the technological process involved in the design of the basic chip, the MCM, and the 3-D package, it is important to show how different chips would be integrated in order to build a more powerful system. This is referred to as modularity and system expansibility and has been widely studied in the literature for neural-network hardware [26]. The circuit as shown in Fig. 2 can be expanded horizontally in order to realize a threshold network with more inputs as well as vertically in order to increase the number of TLUs. Moreover the circuit can be expanded in terms of number of classifiers per network ensemble. Fig. 6 shows the interchip connectivity with an example of four basic chips. The figure illustrates the modularity and easy expansibility of the system without requiring extra interfacing circuitry. It should be noticed that special attention was paid to the design of the output buffers particularly for the global and systolic control buses. The data communicated from one chip to another are done in a systolic manner through flip-flops. This is done in order to avoid the accumulation of delays when chips are pipelined together. B. Single Chip Implementation We have described in the Section III the internal circuitry of the hardware architecture. The chip implements the recall operation of threshold networks as described in Section II and depicted in Fig. 1(c). It includes weight storage memory, switches for topology reconfiguration of TLU, together with a local memory which stores the topology of the threshold network (number of TLUs and inputs) and its computational precision. Before implementing the recall operation of any classification problem, the systolic architecture should be first configured in order to realize the required topology and precision. Both the topology and the precision are determined by the content of a 10-bit register. A 16-bit register is also used in each basic chip in order to store the NO/OP configuration

6 1102 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 14, NO. 5, SEPTEMBER 2003 Fig. 5. The different topologies of threshold networks implemented by the circuit shown in Fig. 3 depending on the selected precision. P p 2 q stands for a network topology with a b bits of precision, p inputs, and q TLUs. For a four bits precision three configurations are possible namely: P ; P and P For eight bits precision two configurations are possible: P 4 2 8;P and only one configuration is possible for a 16 bit precision: P Fig. 6. Interchip connectivity illustrating the modularity and expansibility of the system. A setup of a given topology requires the loading of the control word into the internal control register. This is done through the bus Xi of Fig. 2, during the loading phase of the weights. Only two clock cycles are needed to load the control register. Each time the topology or the precision has to be changed, the content of the 26-bit register needs to be loaded into the chip accordingly. The chip has been fabricated using standard cell 0.7 m CMOS technology. Fig. 7(a) shows a microphotograph of the fabricated chip with a silicon area of mm. The functionality of the packaged basic chips was fully tested prior to wire bounding the dies on the MCM substrates. Test results will be presented in Section V. C. MCM and 3-D Packaging (a) (b) Fig. 7. (a) Microphotograph of the VLSI chip. (b) Photograph of the MCM including four VLSI chips and occupying an area of 2 2 2cm. required for the 4 4 systolic array. The NO/OP register would allow to configure the hardware to a smaller topology. An NO mode also permits to force the processor into a stand-by mode which results in reduced power consumption in the case where smaller network topologies are needed. The main objective behind the development of the MCM and the 3-D chip is to develop a powerful compact, low power pattern recognition system using bagging threshold network ensembles. After designing the basic chip and successfully testing it, four dies were mounted on a single MCM. Each MCM will therefore implement a fully configurable systolic array constituted of bit PEs. Fig. 7(b) shows a photograph of the MCM. It includes four VLSI chips mounted onto a flexible laminated substrate (film) and the dies are wire bonded to wiring pads on the substrate. The MCMs are designed such that the test pads are provided along the back-side of the film. The 3-D packaging technology referred to as MCM-V [21] was used to realize the 3-D chip. After test of the MCMs, the selected ones are stacked, one above the other and encapsulated in epoxy resin. The epoxy is then sawn together with the edge wires of the MCM. The block is then plated with layers of Cu/Ni/Au using standard electroplating techniques. A YAG laser is then used to

7 BERMAK AND MARTINEZ: A COMPACT 3-D VLSI CLASSIFIER 1103 Fig. 8. (a) Photograph of the final 3-D chip occupying a volume of (w2 L2h)=(22220:7) cm. (b) Block diagram of the 3-D chip including four levels of MCM. The three top levels corresponds to the one represented on Fig. 8(a), while the bottom one is dedicated to the report of the PINs to a standard PGA package. pattern the surface so that vertical wire tracks are formed on the cube [21]. Interconnections between the layers are realized on the sides of the module. External signals and power supply of the 3-D chip are routed around the bottom side for interconnection to a standard package. This is realized by dedicating the first layer to a custom-made MCM that is laser soldered to the module. Each step in the fabrication of the 3-D module uses a standard and well-characterized technological process. As a consequence, the MCM-V technology is relatively low cost [21], [22]. Fig. 8(a) shows a photograph of the final 3-D chip and Fig. 8(b) shows its internal block diagram. As is it can be seen from Fig. 8(b), the module includes four substrates layers with four chips on each of the three top levels (12 chips in total). The Bottom substrate is fully dedicated to the report of the vertical connections to an external PGA package. The size of the final module is cm. This represents at least 50% of the volume of an advanced PCB implementation. V. RESULTS AND CHIP PERFORMANCE Several test modes were implemented in order to facilitate the test and the debugging of the chips. The test procedure included the functional test of the single chip, the MCMs and the 3-D chip. A fault characterization study was carried-out prior to mounting the 3-D chip. After extensive functional tests, an experimental setup for selective gas detection and electronic nose application was developed which provided a test-bed for the 3-D chip operating as a bagging threshold network ensembles classifier. A. Experimental Setup and Functional Test of the Single Chip In order to verify the correct operation of the basic chip, two approaches were adopted. Verilog simulator was used in order to generate test vectors from the schematic of the circuit. The test vectors were then inserted into the test program of Tektronics LV500 digital tester. This has allowed us, not only to verify the correct operation of the chip for different topologies and precisions, but also to fully extract its performance. A PCB board was designed connecting a basic chip to the parallel port of a PC. The board allows software control of the weights and the reconfiguration of the threshold network topology. TABLE IV EXPERIMENTAL RESULTS OF THE PROCESSING TIME FOR THE DIFFERENT TOPOLOGIES AND PRECISIONS DESCRIBED IN FIG. 5. THE RESULTS WERE OBTAINED BY TESTING A LARGE NUMBER OF RECALL CYCLES AND CONSIDERING THE WORST CASE RESULT. THE NUMBER OF CLOCK CYCLES REPORTED CORRESPOND TO THE FULL CLASSIFICATION OF ONE INPUT PATTERN AND SHARED PIPELINING CYCLES OF CONSECUTIVE INPUTS ARE NOT SUBTRACTED Both tests confirmed that the chip was fully operational at 20 MHz for all configurations of precision and TLU topology with an average power consumption of 16 mw/mhz. The chip presents a loading time of less than 1 s. This value corresponds to the time required to load the synaptic words (64 words of eight bits) and the control sequences. Table IV summarizes the performance of the chip as function of the different topologies reported in Fig. 5. The number of clock cycles reported excludes the ones needed for the loading phase which is done only once. The inputs were coded with the same number of bits as the synaptic weights. A topology would assume a bits for its inputs and bits for its synaptic weights. The maximum frequency was obtained by considering a large number of different recall cycles and then taking the worst case performance. The processing times for a single recall operation (without including the loading time and without subtracting the shared pipelining cycles of consecutive input patterns) varies from 550 ns to 1400 ns depending on the selected topology and precision. We can note from Table IV that for a topology, the number of clock cycles required (without subtracting the shared pipelining cycles) is equal to. The number of clock cycles is, therefore, independent of the number of inputs. We can also note that the maximum frequency is decreased with increased number of inputs. This is explained by the fact that input data are pipelined vertically through D flip-flops and, hence, a topology with higher number of TLUs would require more clock cycles but without affecting the delay of a basic operation. Partial sums are, however, communicated directly to adjacent processors and, hence, no additional clock cycles are required for

8 1104 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 14, NO. 5, SEPTEMBER 2003 a topology with higher number of inputs at the expense of increased delay of a basic operation which results in a degradation of the maximum clock frequency. It should be noticed that a full pipelining of the operations would result in an increased performance in terms of the maximum clock frequency at the expense of increased number of clock cycles and silicon area required for the pipelining blocks (flip-flops). B. Functional Test of the MCM and Fault Characterization A total of 50 basic chips were manufactured (44 dies and six packaged chips). After verifying the full functionality of the packaged chips, the remaining 44 dies were mounted on 11 MCMs each containing four chips. Special care was paid to the design of the MCM in order to make the test and debugging of each chip within the MCM possible. This was achieved by designing special back-side contact on the MCM such that they would fit on a test socket. Access to interchip connection was also obtained using back-side MCM contacts designed to report the test contacts of the MCM into a standard DIL package. Similar procedure to the one used for the test of a single chip was then employed in order to test the MCMs. A schematic view of the MCM was designed and simulated using Verilog simulator. Test vectors were then generated and automatically inserted into the test program of the Tektronics LV500 digital tester. For each MCM each single chip was separately addressed and tested. Several tests were conducted. Test 1 Wire bonding and MCM level connectivity test: A preliminary microscopic check of wire bonding was conducted before applying test vectors designed to target the connectivity test of the MCM routing signals. Test 2 Propagation of the control signals within the MCM: This test was designed to check the proper propagation of systolic control signals from one chip to another. Test 3 Synaptic weight loading test: Test vectors were applied to check the correct loading of the synaptic weights and the internal control registers. Test 4 NO/OP programming test: Test vectors were applied in order to check the successful programming of the NO/OP feature explained in Section III-A. Test 5 Functional test: In this test, Verilog simulator was used in order to generate test vectors covering functional tests of individual chips, group of two chips, and four chips as well as the test of the different configurations. Fig. 9 summarizes the test results for the 11 MCMs (numbered MCM1 to MCM11). From the 11 MCMs, six were found to be fully operational (54%). A faulty wire bonding was detected for MCM 1 and 2. MCM 3 passed the connectivity test but failed all the remaining tests. MCM4 and 5 successfully passed all test except the last functional test. Even though MCM 4 and 5 do not operate properly, their malfunction does not affect the correct operation of neighboring MCMs if they were mounted within the 3-D package. This is made possible using the NO mode, hence improving considerably the fault tolerance of the final system. Indeed, two MCMs out of the five faulty ones (40%) do not present catastrophic faults, thanks to the NO feature of the chip. Fig. 9. Fault characterization of the 11 MCMs with respect to the five tests described. P and F stands for pass and fail results of the test, respectively. Dark shades represent samples with catastrophic fault. Light shades represent nonoperational samples, however, defective MCMs can be isolated using the NO mode. No shades are samples with no fault. TABLE V EXPERIMENTAL RESULTS OF THE PROCESSING TIME FOR DIFFERENT TOPOLOGIES AND PRECISIONS. THE RESULTS WERE OBTAINED BY TESTING A LARGE NUMBER OF RECALL CYCLES AND CONSIDERING THE WORST-CASE RESULT C. Functional Test of the 3-D Chip The six operational MCMs were used to build two 3-D chips (three levels of four chips each). The same procedure was repeated for the test of the 3-D chip. The test procedure of the MCMs and the 3-D chip was greatly facilitated by using the NO mode so that PEs within the VLSI chips, VLSI chips within the MCM, and MCMs within the 3-D chip are individually addressed and tested. The 3-D chip has been successfully tested for all configurations of precision and TLU topology. Table V summarizes the performance of the 3-D chip as function of some of the possible topologies and sizes of the network ensembles. The test configurations were set similarly to the one reported

9 BERMAK AND MARTINEZ: A COMPACT 3-D VLSI CLASSIFIER 1105 Fig. 11. Experimental setup (left) and time response of the sensor array to a transient concentration of ethanol (right). The time indicated by the arrow is the time when the concentration step is applied. The time indicated by the vertical dashed line is the measurement time corresponding to the steady-state sensor response. Fig. 10. Experimentally measured sequence for the loading acknowledgment signal Lw and the output of the different TLUs. Ch1, Ch2, Ch3, Ch4, and Ch5 representing the clock signal, the acknowledgment loading signal, the TLU s outputs from the first second and third chip respectively. The TLU s output are sequentially fed-out at each rising edge of the clock. for the test of a single chip. The processing time for a single recall operation (without including the loading time and excluding the shared pipelining cycles) varies from 611 ns to 2111 ns depending on the size of the ensemble, the selected topology of the threshold network and precision. We can note from Table V that the hardware resources of the 3-D chip are a tradeoff between the size of the ensemble (first column of the table) and the topology of the threshold network. The hardware resources available is deduced according to:. A complete recall operation is obtained in less than 1.2 s for any topology with 4-bit or 8-bit precision while it is obtained in less than 2.2 s for any 16-bit precision. The 3-D chip presents a loading time of 10.8 s. This value corresponds to the time required to load the synaptic words (768 words of eight bits) and the control sequences. It should be noticed that, for a 4-bit precision, none of the reported topologies requires pipelining chips together as evidenced by the third column of the table. This is explained by the fact that for a 4-bit precision, each basic chip is able to cope with 16 inputs and, therefore, more TLUs are just obtained by using more basic chips without communicating partial sums between chips. This is not the case for an8-bit and a 16-bit precision, where each basic chip can only handle eight inputs for an 8-bit precision or four inputs for a 16-bit precision. For example, the configuration requires pipelining two basic chips and the partial sums are passed from one chip to another through flip-flops. This results in additional clock cycles, where is the number of pipelined chips. We can also note from Table V that the number of clock cycles required for the network is exactly the same as the one reported in Table IV for a single chip implementing while the frequency is slightly better for the later case. This is explained by the fact that the network is obtained by operating 12 chips in parallel. Each chip realizes one element of the ensemble. This results in exactly the same number of clock cycles required while the frequency is slightly reduced for the network due to additional wiring delay of the output pad to the external PGA Pin of the 3-D package. It can also be noticed from Table V that the performance of configurations, and are exactly similar. This is simply because each of the previous configurations are obtained by just reorganizing the number of individual classifiers and the number of TLUs per classifier. Fig. 10 shows the experimental output from the chip, which corresponds to a full recall cycle for the topology reported in Table V. A 16-bit 4 4 weight matrix was loaded into each chip within the 3-D prototype. A test vector was also fed serially to the chip from the least significant bit to the most significant one. The values of both and were chosen so that the output of two adjacent rows of the systolic array would have opposite sign and hence the TLU s outputs (Out signal) would oscillate at. Fig. 10 shows the output waveforms, with Ch1, Ch2, Ch3, Ch4, and Ch5 representing the clock signal, the acknowledgment loading signal, the TLUs outputs from the first, second, and third chip, respectively. A chip within each MCM has been selected for the purpose of this test. The first TLUs output is obtained after eight clock cycles of the load acknowledgment signal. The 192 TLU outputs are obtained in only 23 clock cycles using only 12 physical pins. Each output would generate sequentially the results corresponding to 16 TLUs as shown in Fig. 10. This corresponds to a very high level of parallelism realized with very limited physical outputs. D. Test of the 3-D Chip as an Electronic Nose It is well known that the gas sensors commercially available present a lack of selectivity and, thus, respond to a wide variety of odors. In this section, we report the performance of our 3-D chip combined with a gas sensor array in order to act as an electronic nose. An automated gas delivery experimental setup was developed for extracting volatile compounds at given concentrations from liquids (Fig. 11). It consists of two pumps, two mass flow controllers (MFCs), one bubbler, a gas chamber, and a data acquisition system. Ethanol or butanol vapors were injected into the gas chamber at a flow rate determined by the mass flow controllers. Knowing these flow rates and the saturated vapor pressure at

10 1106 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 14, NO. 5, SEPTEMBER 2003 (a) (b) Fig. 12. Experimentally measured (a) training and (b) test performance of the odor dataset for an ensemble of ten 8-bit precision threshold networks (full line), ten 4-bit threshold networks (dashed line), and 20 4-bit threshold networks (dotted line). At 20 MHz, the circuit achieves a classification performance of 28 M samples/s, 11 M samples/s, and 4 M samples/s for four-, eight-, and 16-bit weight precision, respectively, and 11-bit input precision. room temperature, the concentration of the injected gas was calculated. We used 16 different concentrations for ethanol ranging from 1360 to 5165 ppm and 15 different concentrations for butanol ranging from 870 to 3050 ppm. We used sensor arrays composed of five commercial TGS Figaro gas sensors (TGS 2600, 2602, 2610, 2611, and 2620). The potential differences across the sensor resistances were measured using a voltage divider with 2.2 k load resistors while keeping the heating voltage constant to 5 V. The sensors output voltages were sampled at a rate of 10 Hz and quantized with an 11 bit analog to digital converter. A typical plot of these voltages versus time for a transient concentration of ethanol is also shown in Fig. 11 at the right. It shows that the sensor array system reacts slowly to a transient gas concentration and takes some minutes to reach the stationary state. This time is a combination of the time needed to fill the chamber and the time needed for the sensors to respond. The steady-state value was recorded for each concentration of the two gases. Besides being nonselective, gas sensors present also long-term drifts and can be poisoned by a particular gas or an excessive concentration so that they have to be replaced from time to time by new ones. However, this is not easy because gas sensors of the same type do not have exactly the same characteristics due to a bad control of the manufacturing process. To allow for such a sensor replacement within the array, the pattern recognition system has to be robust to some dispersions in the characteristic of the sensors. In order to accomplish this, we have recorded the steady-state outputs of four sensor arrays, each one composed of the five TGS Figaro gas sensors (same type as those mentioned above) for the 31 different concentrations of ethanol and butanol. This yielded a total of 124 patterns. This odor dataset 1 was used in Section II for estimating the performance of threshold network ensemble trained with bagging. However, estimating the performance of the 3-D chip by leaving-one-out is difficult, as it requires testing 124 different threshold network ensembles. Instead, we decided to randomly split the odor dataset into a training set (100 patterns) and a test set (24 patterns). 1 Available at: To implement the recall and the training test, the user first latches the synaptic weights and the control bits onto the Xi and buses of the 3-D chip, while presenting a load-in command. This will enable the chip to systolically load the data into the different cascaded chips. The load-in signal will propagate from PE to PE and from chip to chip and, thus, allowing each chip to load the weights and the control bits configurations using two input buses. The acknowledgment signal is received from the 3-D chip once it has successfully completed the data loading. The 11-bit input data obtained from the data acquisition system were then fed into the 3-D chip. The test and the training performance were experimentally measured on the 3-D chip and compared with the performance obtained by simulation for four-bit eight-bit of precisions and for ensembles of ten and 20 threshold networks. Fig. 12(a) and (b) shows the results obtained for both training and test, respectively. A 98% accuracy was obtained for the training set in the case of an ensemble of ten threshold networks with an 8-bit weight precision [solid line of Fig. 12(a)] while the performance dropped to 82% for a four-bit weight precision (dashed line). Using an ensemble of 20 threshold networks for the four-bit precision improves the training accuracy by 8% resulting in a 90% accuracy (dotted line). A test performance of 96% was obtained in the case of an ensemble of ten threshold networks with 8-bit weight precision [solid line of Fig. 12(b)] while the performance dropped to 84% for a 4-bit weight precision (dashed line). Using an ensemble of 20 threshold networks for the 4-bit precision improves the performance by 3% resulting in an 87% accuracy (dotted line). The performance measurements obtained for both the training and the test match these obtained by simulation. However, the test and the training performance dropped for frequencies higher than 20 MHz. This was expected by the functional tests of Section V-C as reported maximum frequencies were found to be around 20 MHz. It should be noticed, however, that the peak classification per second performance achievable at a relatively low frequency of 20 Mhz are very high. Indeed, a 28 M samples/s, 11 M samples/s, and 4-M samples/s are achieved for 4-, 8-, and 16-bit weights precision, respectively, with an input precision

11 BERMAK AND MARTINEZ: A COMPACT 3-D VLSI CLASSIFIER 1107 (a) (b) Fig. 13. (a) Experimentally measured classification time as function of the weights and input precisions for odor dataset. (b) Experimentally measured power consumption as function of the classification time for different weight precision. The input data precision is 11-bit. TABLE VI PERFORMANCE COMPARISON OF OUR DESIGN WITH SOME NEURAL DIGITAL CIRCUITS REPORTED IN THE LITERATURE [27], [31] [38]. IN THE TABLE, CPPS IS THE NUMBER OF CONNECTIONS PRIMITIVES PER SECOND,ENERGY/PC STANDS FOR THE ENERGY PER PRIMITIVE CONNECTION (ENERGY NORMALIZED WITH RESPECT TO BOTH NUMBER OF CONNECTIONS AND NUMBER OF BITS PER CONNECTION), CFG STANDS FOR CONFIGURABLE AND NA STANDS FOR NOT AVAILABLE of 11-bits, thanks to the very high level of parallelism obtained in the 3-D chip (12 chips operating in parallel) and the relatively short number of clock cycles needed for a classification due to the pipelining properties of the systolic array. Fig. 13(a) shows the classification time for the input patterns of the odor dataset. The dotted curve correspond to the measured data from the chip corresponding to an 11-bit input accuracy while the figures reported for 4-, 8-, and 16-bit input precision are deduced by experimental measurement of the maximum frequency and analytically deriving the classification time. We can note from Fig. 13(a) that the classification time is 0.7 s for a 4-bit and 11-bit weight and the input precisions respectively. This was obtained by pipelining the input patterns in our systolic architecture implementing 20 threshold networks using the topology (768 connections). This leads to a peak performance of 48 GCPPS. The normalized power per classification time and as function of the weight precision is reported in Fig. 13(b). It is clearly shown from this figure that the circuit can operate at very low power for four-bit precision (0.13 mw/1 K classification per s) and eight-bit precision (0.33 mw/1 K classification per second). The power consumption for 16-bit precision is around 0.85 mw/1 K which is comparable to the one reported for VindAx neural network processor [27]. Table VI reports further the performance comparison between our circuit and most well-known digital neural networks circuits reported in the literature. It should be noticed, however, that the comparison of different neural-network hardware is often difficult and can be very tricky as confirmed by several authors [28], [29]. In Table VI, we therefore describe briefly some selected digital neural-network circuits and report their performance using normalized figures of merits such as connections

12 1108 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 14, NO. 5, SEPTEMBER 2003 primitives per second (CPPS) [30] and energy per primitive connection [28]. Even though our main objective is not to build a neural-network accelerator, Table VI shows that the 3-D chip has a computational speed which is in the order of 50% of these reported for advanced neural-network accelerators. The power consumption is however very competitive with VindAx, a very advanced digital neural circuit recently reported [27]. It can also be seen from Table VI that the 3-D chip proposed in this paper presents the advantage of being reconfigurable in terms of both precision and topology offering the possibilities of increasing the computational power at lower level of precisions. Lower level of precision also allows for a low power operation [see Fig. 13(b)] as nonrequired resources are kept on a switched off mode. VI. DISCUSSION In this paper, we have reported a 3-D circuit implementation of bagging ensembles as well as its experimental test results. In our circuit, individual classifiers within the ensemble are decision trees specified as threshold networks having a layer of TLUs followed by combinatorial logic elements. We have shown that such bagging threshold network ensembles are more accurate than single classifiers trained on the original dataset and that the required weight precision depends on the application at hand. The proposed architecture supports a variable precision computation (4/8/16-bit) in which it is possible for low-precision applications to either reduce the power consumption or increase the ensemble size to improve the classification performance. Inefficient usage of the hardware resources is avoided since both the weights and input precisions are user-defined for the application at hand. The proposed architecture also supports a configurable network structure (number of networks per ensemble, number of TLUs and inputs per network). In addition, the design includes other novel features such as its loading technique of the weights and its modular expansibility. Statistical tests performed on the manufactured chips showed that the proposed NO/OP feature implemented in our circuit improves the fault tolerance of the system by as much as 40%. In order to meet the high computational requirement of threshold network ensembles, we developed a compact 3-D circuit which includes four layers of MCM integrating 16-bit 192 PEs with 768 digital synapses and up to 192 TLUs implemented in a module size of cm. This represents at least 50% the size of a very advanced PCB implementation including the same number of chips. The power consumption of the 3-D chip depends on the selected precision and the required classification speed. The circuit consumes 0.13, 0.33, and 0.85 mw at 1 K classification per second for 4-, 8-, and 16-bit, respectively. The very high level of compactness together with the relatively low-power operation of the 3-D chip make it a very suitable candidate for portable and compact pattern recognition systems. Successful operation of the 3-D chip for various precisions and ensemble sizes was first demonstrated through extensive functional tests. Operation of the 3-D chip as a compact pattern recognition hardware was also demonstrated through an electronic nose application. Experimental results suggests a peak classification performance of 28, 11, and 4 M samples/s for 4-bit, 8-bit, and 16-bit, respectively. A major issue in this work concerns on-chip learning. On chip-learning could have been implemented by means of additional circuitry, reducing, however, the space available on the chip for the threshold network ensemble. Whether on-chip learning is necessary or not depends on the application at hand. On-chip learning is attractive when continuous unsupervised training is needed or when the training time may require days of computing. This is certainly not the case for bagging threshold networks that only need to be trained once and that are very fast to build. As an example, it takes an average of 4.6 s on a PC Pentium III running at 1 GHz to obtain a bagging ensemble of ten threshold networks trained on the odor dataset. This time includes the generation of the sampled datafiles, the training of the decision trees and the extraction of the optimized logical functions needed for transforming each decision tree into an equivalent threshold network. The complete leave-one-out procedure on the odor dataset takes only 12 min. Another related issue that needs consideration concerns the limited precision of the hardware. It might be possible to take care of the chosen weight precision for threshold network ensembles by using a boosting procedure. Boosting is similar to bagging except that examples incorrectly classified by previous classifiers are chosen more often in the sampled training set than examples that were correctly classified (for details refer to If earlier classifiers are evaluated with quantized weights, boosting will attempt to focus the new classifier on the classification errors whether they come from the chosen weight precision or not. Work is ongoing to test this idea. ACKNOWLEDGMENT This work was initiated while the authors were at LAAS-CNRS Toulouse. The authors would like to thank D. Hoeung for his help on extracting optimized logical functions from the tree structure, T. Doconto for wire bonding the MCMs and 3-D plus Electronics, and C. Val for manufacturing the 3-D block. REFERENCES [1] T. G. Dietterich, Machine learning research: Four current directions, AI Mag., pp , [2] L. Breiman, Bagging predictors, Machine Learning, vol. 24, no. 2, pp , [3] D. Opitz and R. Maclin, Popular ensemble methods: An empirical study, J. Artificial Intell. Res., vol. 11, pp , [4] R. Maclin and D. Opitz, An empirical evaluation of bagging and boosting, in Proc. AAAI, [5] C. J. C. Burges, A tutorial on support vector machines for pattern recognition, Data Mining Knowledge Discovery, vol. 2, no. 2, [6] V. Vapnik, Statistical Learning Theory. New York: Wiley, [7] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification and Regression Trees. Belmont, CA: Wadsworth Int. Group, [8] J. R. Quinlan, Induction of decision trees, Machine Learning, vol. 1, pp , [9] S. K. Murthy, S. Kasif, and S. Salzberg, A system for induction of oblique decision trees, J. Artificial Intell. Res. 2, pp. 1 33, [10] D. Martinez and D. Estéve, The offset algorithm: Building and learning method for multilayer neural networks, Europhys. Lett, vol. 18, no. 2, pp , 1992.

13 BERMAK AND MARTINEZ: A COMPACT 3-D VLSI CLASSIFIER 1109 [11] M. Biehl and M. Opper, Construction algorithm for the parity-machine, Phys. A, vol. 193, no. 3 4, pp , [12] S. Knerr, L. Perconnaz, and G. Dreyfus, Handwritten digit recognition by neural networks with single-layer training, IEEE Trans. Neural Networks, vol. 6, pp , Nov [13] H. E. Ayestaran and R. W. Prager, The Logical Gates Growing Network, Cambridge Univ., Cambridge, U.K., Tech. Rep. CUED/F-IN- FENG/TR 137, [14] N. K. Bose and A. K. Garda, Neural network design using voronoi diagrams, IEEE Trans. Neural Networks, vol. 4, pp , Sept [15] E. Langheld and K. Goser, Generalized boolean operations for neural networks, in Proc. Int. Joint Conf. Neural Networks, 1990, pp. II 159 II 162. [16] I. K. Sethi, Entropy net: From decision trees to neural nets, Proc. IEEE, vol. 78, pp , Oct [17], Neural implementation of tree classifiers, IEEE Trans. Syst., Man, Cybern., vol. 25, pp , Aug [18] D. Goldfarb and A. Idnani, A numerically stable dual method for solving strictly convex quadratic programs, Math. Programming, vol. 27, pp. 1 33, [19] M. J. D. Powell, ZQPCVX A Fortran Subroutine for Convex Quadratic Programming, Univ. Cambridge, Cambridge, U.K., Tech. Rep. DAMTP/1983/NA17, [20] P. D. Moerland and E. Fiesler, Hardware-friendly learning algorithms for neural networks: And overview, in Proc. MicroNeuro 96, Lausanne, Switzerland, 1996, pp [21] S. P. Larcombe, P. A. Ivey, N. L. Seed, J. M. Stern, and C. M. Val, Electronic systems in dense three-dimensional packages, Electron. Lett., vol. 31, no. 10, pp , June [22] S. F. Al Sarawi, D. Abbott, and P. D. Franzon, A review of 3-D packaging technology, IEEE Trans. Components, Packaging, Manuf. Technol. B, vol. 21, pp. 2 14, Jan [23] K. Goser et al., VLSI technologies for artificial neural networks, IEEE Micro Mag., pp , Sept [24] A. Bermak, D. Martinez, and J. L. Noullet, High-density 16/8/4-bit configurable multiplier, Proc. Inst. Elect. Eng. Circuits Devices Systems, vol. 144, no. 5, pp , Oct [25] S. Satyanarayana, Y. P. Tsividis, and H. P. Graf, A reconfigurable VLSI neural network, IEEE J. Solid-State Circuits, vol. 27, pp , Jan [26] T. Serrano-Gotarredona and B. Linares-Barranco, A real-time clustering microchip neural engine, IEEE Trans. VLSI Syst., vol. 4, pp , June [27] VindAX Processor Silicon, AXEON, Limited, Jan Data Sheet Issue 5. [28] R. Schuffny, A. Graupner, and J. Schreiter, Hardware for neural networks, presented at the 4th Int. Workshop Neural Networks Applications, Magdeburg, Germany, Mar [29] Overview of Neural Hardware, J. N. H. Heemsherk. (1995). [Online]. Available: ftp://ftp.mrc-apu.cam.ac.uk/pub/nn/mirre/neurhard.ps [30] E. Van Keulen, S. Colak, H. Withagen, and H. Hegt, Neural network hardware performance criteria, in Proc. IEEE Int. Conf. Neural Networks, June 1994, pp [31] MD 1220 Data Sheet, Micro Devices, [32] B. Friebe, S. Neusser, and B. Hofflinger, SIOP: Application-specific neural hardware, in Proc. MicroNeuro 97, Dresden, Germany, 1997, pp [33] W. Eppler, T. Fischer, and H. Gemmeke, Neural chip SAND/1 for real time pattern recognition, IEEE Trans. Nuclear Science, vol. 45, pp , Oct [34] N. Mauduit, M. Duranton, J. Gobert, and J. A. Sirat, Lneuro 1.0: A piece of hardware LEGO for building neural network systems, IEEE Trans. Neural Networks, vol. 3, pp , May [35] M. Duranton, Lneuro 2.3: Image processing by neural networks, IEEE Micro Mag., vol. 16, pp , Oct [36] Adaptive Solutions: CNAPS product Information, [37] U. Ramacher, J. Beichter, and N. Bruls, Architecture of a general-purpose neural signal processor, in Proc. Int. Joint Conf. Neural Networks, vol. I, July 1991, pp [38] P. Ienne and M. A. Viredaz, GENES IV: A bit-serial processing element for a built-model neural-network accelerator, in Proc. Int. Conf. Application-Specific Array Processors, vol. I, Oct. 1993, pp Amine Bermak (M 99) received the M.Eng. and Ph.D. degrees in electronic engineering from Paul Sabatier University, Toulouse, France, in 1994 and 1998 respectively. He was part of the Microsystems and Microstructures Research Group at the French National Research Center LAAS-CNRS, where he developed a number of VLSI chips for artificial neural network classification and detection applications in a project funded by Motorola-Toulouse. He spent one year at the Advanced Computer Architecture Research Group, York University, York, U.K., where he worked on VLSI implementation of CMM neural-network for vision applications in a project funded by British Aerospace. In 1998, he joined Edith Cowan University, Perth, Australia, as a Research Fellow at the Visual Information Processing research Group, where he worked on the design of smart vision sensors with on-chip biologically inspired image processing. In January 2000, he became a Lecturer with the School of Engineering and Mathematics, Edith Cowan University, Perth, Australia, where he was promoted to Senior lecturer in November He is currently an Assistant Professor with the Electrical and Electronic Engineering Department, Hong Kong University of Science and Technology, Hong Kong. His current research interests include VLSI circuits and systems, packaging technologies, CMOS image sensors, and VLSI implementation of signal and image processing algorithms. Dominique Martinez received the Ph.D. degree in electrical and electronic engineering from Paul Sabatier University, Toulouse, France, in He was a Postdoctoral Fellow at the Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, and the VLSI Group, Harvard University, Cambridge, MA, in 1992 and 1994, respectively. From 1993 to 1999, he was with LAAS- CNRS, Toulouse, where his research was concerned with machine learning (neural networks and support vector machines). In 2000, he joined LORIA, Nancy, France, where his research interests currently focus on biologically inspired neural networks for artificial olfaction (neuromorphic electronic noses).

Scan. This is a sample of the first 15 pages of the Scan chapter.

Scan. This is a sample of the first 15 pages of the Scan chapter. Scan This is a sample of the first 15 pages of the Scan chapter. Note: The book is NOT Pinted in color. Objectives: This section provides: An overview of Scan An introduction to Test Sequences and Test

More information

WINTER 15 EXAMINATION Model Answer

WINTER 15 EXAMINATION Model Answer Important Instructions to examiners: 1) The answers should be examined by key words and not as word-to-word as given in the model answer scheme. 2) The model answer and the answer written by candidate

More information

Reconfigurable Neural Net Chip with 32K Connections

Reconfigurable Neural Net Chip with 32K Connections Reconfigurable Neural Net Chip with 32K Connections H.P. Graf, R. Janow, D. Henderson, and R. Lee AT&T Bell Laboratories, Room 4G320, Holmdel, NJ 07733 Abstract We describe a CMOS neural net chip with

More information

EL302 DIGITAL INTEGRATED CIRCUITS LAB #3 CMOS EDGE TRIGGERED D FLIP-FLOP. Due İLKER KALYONCU, 10043

EL302 DIGITAL INTEGRATED CIRCUITS LAB #3 CMOS EDGE TRIGGERED D FLIP-FLOP. Due İLKER KALYONCU, 10043 EL302 DIGITAL INTEGRATED CIRCUITS LAB #3 CMOS EDGE TRIGGERED D FLIP-FLOP Due 16.05. İLKER KALYONCU, 10043 1. INTRODUCTION: In this project we are going to design a CMOS positive edge triggered master-slave

More information

Low Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Low Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Low Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No. # 29 Minimizing Switched Capacitance-III. (Refer

More information

Design of Fault Coverage Test Pattern Generator Using LFSR

Design of Fault Coverage Test Pattern Generator Using LFSR Design of Fault Coverage Test Pattern Generator Using LFSR B.Saritha M.Tech Student, Department of ECE, Dhruva Institue of Engineering & Technology. Abstract: A new fault coverage test pattern generator

More information

Optimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015

Optimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015 Optimization of Multi-Channel BCH Error Decoding for Common Cases Russell Dill Master's Thesis Defense April 20, 2015 Bose-Chaudhuri-Hocquenghem (BCH) BCH is an Error Correcting Code (ECC) and is used

More information

VLSI Technology used in Auto-Scan Delay Testing Design For Bench Mark Circuits

VLSI Technology used in Auto-Scan Delay Testing Design For Bench Mark Circuits VLSI Technology used in Auto-Scan Delay Testing Design For Bench Mark Circuits N.Brindha, A.Kaleel Rahuman ABSTRACT: Auto scan, a design for testability (DFT) technique for synchronous sequential circuits.

More information

Decade Counters Mod-5 counter: Decade Counter:

Decade Counters Mod-5 counter: Decade Counter: Decade Counters We can design a decade counter using cascade of mod-5 and mod-2 counters. Mod-2 counter is just a single flip-flop with the two stable states as 0 and 1. Mod-5 counter: A typical mod-5

More information

IT T35 Digital system desigm y - ii /s - iii

IT T35 Digital system desigm y - ii /s - iii UNIT - III Sequential Logic I Sequential circuits: latches flip flops analysis of clocked sequential circuits state reduction and assignments Registers and Counters: Registers shift registers ripple counters

More information

A FOUR GAIN READOUT INTEGRATED CIRCUIT : FRIC 96_1

A FOUR GAIN READOUT INTEGRATED CIRCUIT : FRIC 96_1 A FOUR GAIN READOUT INTEGRATED CIRCUIT : FRIC 96_1 J. M. Bussat 1, G. Bohner 1, O. Rossetto 2, D. Dzahini 2, J. Lecoq 1, J. Pouxe 2, J. Colas 1, (1) L. A. P. P. Annecy-le-vieux, France (2) I. S. N. Grenoble,

More information

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath Objectives Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath In the previous chapters we have studied how to develop a specification from a given application, and

More information

12-bit Wallace Tree Multiplier CMPEN 411 Final Report Matthew Poremba 5/1/2009

12-bit Wallace Tree Multiplier CMPEN 411 Final Report Matthew Poremba 5/1/2009 12-bit Wallace Tree Multiplier CMPEN 411 Final Report Matthew Poremba 5/1/2009 Project Overview This project was originally titled Fast Fourier Transform Unit, but due to space and time constraints, the

More information

VLSI System Testing. BIST Motivation

VLSI System Testing. BIST Motivation ECE 538 VLSI System Testing Krish Chakrabarty Built-In Self-Test (BIST): ECE 538 Krish Chakrabarty BIST Motivation Useful for field test and diagnosis (less expensive than a local automatic test equipment)

More information

Bit Swapping LFSR and its Application to Fault Detection and Diagnosis Using FPGA

Bit Swapping LFSR and its Application to Fault Detection and Diagnosis Using FPGA Bit Swapping LFSR and its Application to Fault Detection and Diagnosis Using FPGA M.V.M.Lahari 1, M.Mani Kumari 2 1,2 Department of ECE, GVPCEOW,Visakhapatnam. Abstract The increasing growth of sub-micron

More information

Based on slides/material by. Topic 14. Testing. Testing. Logic Verification. Recommended Reading:

Based on slides/material by. Topic 14. Testing. Testing. Logic Verification. Recommended Reading: Based on slides/material by Topic 4 Testing Peter Y. K. Cheung Department of Electrical & Electronic Engineering Imperial College London!! K. Masselos http://cas.ee.ic.ac.uk/~kostas!! J. Rabaey http://bwrc.eecs.berkeley.edu/classes/icbook/instructors.html

More information

Random Access Scan. Veeraraghavan Ramamurthy Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL

Random Access Scan. Veeraraghavan Ramamurthy Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL Random Access Scan Veeraraghavan Ramamurthy Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL ramamve@auburn.edu Term Paper for ELEC 7250 (Spring 2005) Abstract: Random Access

More information

NH 67, Karur Trichy Highways, Puliyur C.F, Karur District UNIT-III SEQUENTIAL CIRCUITS

NH 67, Karur Trichy Highways, Puliyur C.F, Karur District UNIT-III SEQUENTIAL CIRCUITS NH 67, Karur Trichy Highways, Puliyur C.F, 639 114 Karur District DEPARTMENT OF ELETRONICS AND COMMUNICATION ENGINEERING COURSE NOTES SUBJECT: DIGITAL ELECTRONICS CLASS: II YEAR ECE SUBJECT CODE: EC2203

More information

UNIT IV CMOS TESTING. EC2354_Unit IV 1

UNIT IV CMOS TESTING. EC2354_Unit IV 1 UNIT IV CMOS TESTING EC2354_Unit IV 1 Outline Testing Logic Verification Silicon Debug Manufacturing Test Fault Models Observability and Controllability Design for Test Scan BIST Boundary Scan EC2354_Unit

More information

Instructions. Final Exam CPSC/ELEN 680 December 12, Name: UIN:

Instructions. Final Exam CPSC/ELEN 680 December 12, Name: UIN: Final Exam CPSC/ELEN 680 December 12, 2005 Name: UIN: Instructions This exam is closed book. Provide brief but complete answers to the following questions in the space provided, using figures as necessary.

More information

Sharif University of Technology. SoC: Introduction

Sharif University of Technology. SoC: Introduction SoC Design Lecture 1: Introduction Shaahin Hessabi Department of Computer Engineering System-on-Chip System: a set of related parts that act as a whole to achieve a given goal. A system is a set of interacting

More information

LFSR Counter Implementation in CMOS VLSI

LFSR Counter Implementation in CMOS VLSI LFSR Counter Implementation in CMOS VLSI Doshi N. A., Dhobale S. B., and Kakade S. R. Abstract As chip manufacturing technology is suddenly on the threshold of major evaluation, which shrinks chip in size

More information

Altera s Max+plus II Tutorial

Altera s Max+plus II Tutorial Altera s Max+plus II Tutorial Written by Kris Schindler To accompany Digital Principles and Design (by Donald D. Givone) 8/30/02 1 About Max+plus II Altera s Max+plus II is a powerful simulation package

More information

Long and Fast Up/Down Counters Pushpinder Kaur CHOUHAN 6 th Jan, 2003

Long and Fast Up/Down Counters Pushpinder Kaur CHOUHAN 6 th Jan, 2003 1 Introduction Long and Fast Up/Down Counters Pushpinder Kaur CHOUHAN 6 th Jan, 2003 Circuits for counting both forward and backward events are frequently used in computers and other digital systems. Digital

More information

Design and Analysis of Modified Fast Compressors for MAC Unit

Design and Analysis of Modified Fast Compressors for MAC Unit Design and Analysis of Modified Fast Compressors for MAC Unit Anusree T U 1, Bonifus P L 2 1 PG Student & Dept. of ECE & Rajagiri School of Engineering & Technology 2 Assistant Professor & Dept. of ECE

More information

Timing Error Detection: An Adaptive Scheme To Combat Variability EE241 Final Report Nathan Narevsky and Richard Ott {nnarevsky,

Timing Error Detection: An Adaptive Scheme To Combat Variability EE241 Final Report Nathan Narevsky and Richard Ott {nnarevsky, Timing Error Detection: An Adaptive Scheme To Combat Variability EE241 Final Report Nathan Narevsky and Richard Ott {nnarevsky, tomott}@berkeley.edu Abstract With the reduction of feature sizes, more sources

More information

Built-In Self-Test (BIST) Abdil Rashid Mohamed, Embedded Systems Laboratory (ESLAB) Linköping University, Sweden

Built-In Self-Test (BIST) Abdil Rashid Mohamed, Embedded Systems Laboratory (ESLAB) Linköping University, Sweden Built-In Self-Test (BIST) Abdil Rashid Mohamed, abdmo@ida ida.liu.se Embedded Systems Laboratory (ESLAB) Linköping University, Sweden Introduction BIST --> Built-In Self Test BIST - part of the circuit

More information

CS 61C: Great Ideas in Computer Architecture

CS 61C: Great Ideas in Computer Architecture CS 6C: Great Ideas in Computer Architecture Combinational and Sequential Logic, Boolean Algebra Instructor: Alan Christopher 7/23/24 Summer 24 -- Lecture #8 Review of Last Lecture OpenMP as simple parallel

More information

CHAPTER 6 ASYNCHRONOUS QUASI DELAY INSENSITIVE TEMPLATES (QDI) BASED VITERBI DECODER

CHAPTER 6 ASYNCHRONOUS QUASI DELAY INSENSITIVE TEMPLATES (QDI) BASED VITERBI DECODER 80 CHAPTER 6 ASYNCHRONOUS QUASI DELAY INSENSITIVE TEMPLATES (QDI) BASED VITERBI DECODER 6.1 INTRODUCTION Asynchronous designs are increasingly used to counter the disadvantages of synchronous designs.

More information

Using on-chip Test Pattern Compression for Full Scan SoC Designs

Using on-chip Test Pattern Compression for Full Scan SoC Designs Using on-chip Test Pattern Compression for Full Scan SoC Designs Helmut Lang Senior Staff Engineer Jens Pfeiffer CAD Engineer Jeff Maguire Principal Staff Engineer Motorola SPS, System-on-a-Chip Design

More information

Lecture 18 Design For Test (DFT)

Lecture 18 Design For Test (DFT) Lecture 18 Design For Test (DFT) Xuan Silvia Zhang Washington University in St. Louis http://classes.engineering.wustl.edu/ese461/ ASIC Test Two Stages Wafer test, one die at a time, using probe card production

More information

Combinational vs Sequential

Combinational vs Sequential Combinational vs Sequential inputs X Combinational Circuits outputs Z A combinational circuit: At any time, outputs depends only on inputs Changing inputs changes outputs No regard for previous inputs

More information

March 13, :36 vra80334_appe Sheet number 1 Page number 893 black. appendix. Commercial Devices

March 13, :36 vra80334_appe Sheet number 1 Page number 893 black. appendix. Commercial Devices March 13, 2007 14:36 vra80334_appe Sheet number 1 Page number 893 black appendix E Commercial Devices In Chapter 3 we described the three main types of programmable logic devices (PLDs): simple PLDs, complex

More information

An MFA Binary Counter for Low Power Application

An MFA Binary Counter for Low Power Application Volume 118 No. 20 2018, 4947-4954 ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu An MFA Binary Counter for Low Power Application Sneha P Department of ECE PSNA CET, Dindigul, India

More information

Retiming Sequential Circuits for Low Power

Retiming Sequential Circuits for Low Power Retiming Sequential Circuits for Low Power José Monteiro, Srinivas Devadas Department of EECS MIT, Cambridge, MA Abhijit Ghosh Mitsubishi Electric Research Laboratories Sunnyvale, CA Abstract Switching

More information

Logic and Computer Design Fundamentals. Chapter 7. Registers and Counters

Logic and Computer Design Fundamentals. Chapter 7. Registers and Counters Logic and Computer Design Fundamentals Chapter 7 Registers and Counters Registers Register a collection of binary storage elements In theory, a register is sequential logic which can be defined by a state

More information

data and is used in digital networks and storage devices. CRC s are easy to implement in binary

data and is used in digital networks and storage devices. CRC s are easy to implement in binary Introduction Cyclic redundancy check (CRC) is an error detecting code designed to detect changes in transmitted data and is used in digital networks and storage devices. CRC s are easy to implement in

More information

Solutions to Embedded System Design Challenges Part II

Solutions to Embedded System Design Challenges Part II Solutions to Embedded System Design Challenges Part II Time-Saving Tips to Improve Productivity In Embedded System Design, Validation and Debug Hi, my name is Mike Juliana. Welcome to today s elearning.

More information

Contents Circuits... 1

Contents Circuits... 1 Contents Circuits... 1 Categories of Circuits... 1 Description of the operations of circuits... 2 Classification of Combinational Logic... 2 1. Adder... 3 2. Decoder:... 3 Memory Address Decoder... 5 Encoder...

More information

Chapter 4. Logic Design

Chapter 4. Logic Design Chapter 4 Logic Design 4.1 Introduction. In previous Chapter we studied gates and combinational circuits, which made by gates (AND, OR, NOT etc.). That can be represented by circuit diagram, truth table

More information

Department of Electrical and Computer Engineering University of Wisconsin Madison. Fall Final Examination CLOSED BOOK

Department of Electrical and Computer Engineering University of Wisconsin Madison. Fall Final Examination CLOSED BOOK Department of Electrical and Computer Engineering University of Wisconsin Madison Fall 2014-2015 Final Examination CLOSED BOOK Kewal K. Saluja Date: December 14, 2014 Place: Room 3418 Engineering Hall

More information

Chapter 7 Memory and Programmable Logic

Chapter 7 Memory and Programmable Logic EEA091 - Digital Logic 數位邏輯 Chapter 7 Memory and Programmable Logic 吳俊興國立高雄大學資訊工程學系 2006 Chapter 7 Memory and Programmable Logic 7-1 Introduction 7-2 Random-Access Memory 7-3 Memory Decoding 7-4 Error

More information

L11/12: Reconfigurable Logic Architectures

L11/12: Reconfigurable Logic Architectures L11/12: Reconfigurable Logic Architectures Acknowledgements: Materials in this lecture are courtesy of the following people and used with permission. - Randy H. Katz (University of California, Berkeley,

More information

Unit 8: Testability. Prof. Roopa Kulkarni, GIT, Belgaum. 29

Unit 8: Testability. Prof. Roopa Kulkarni, GIT, Belgaum. 29 Unit 8: Testability Objective: At the end of this unit we will be able to understand Design for testability (DFT) DFT methods for digital circuits: Ad-hoc methods Structured methods: Scan Level Sensitive

More information

Design for Testability

Design for Testability TDTS 01 Lecture 9 Design for Testability Zebo Peng Embedded Systems Laboratory IDA, Linköping University Lecture 9 The test problems Fault modeling Design for testability techniques Zebo Peng, IDA, LiTH

More information

A MISSILE INSTRUMENTATION ENCODER

A MISSILE INSTRUMENTATION ENCODER A MISSILE INSTRUMENTATION ENCODER Item Type text; Proceedings Authors CONN, RAYMOND; BREEDLOVE, PHILLIP Publisher International Foundation for Telemetering Journal International Telemetering Conference

More information

VLSI Design: 3) Explain the various MOSFET Capacitances & their significance. 4) Draw a CMOS Inverter. Explain its transfer characteristics

VLSI Design: 3) Explain the various MOSFET Capacitances & their significance. 4) Draw a CMOS Inverter. Explain its transfer characteristics 1) Explain why & how a MOSFET works VLSI Design: 2) Draw Vds-Ids curve for a MOSFET. Now, show how this curve changes (a) with increasing Vgs (b) with increasing transistor width (c) considering Channel

More information

PICOSECOND TIMING USING FAST ANALOG SAMPLING

PICOSECOND TIMING USING FAST ANALOG SAMPLING PICOSECOND TIMING USING FAST ANALOG SAMPLING H. Frisch, J-F Genat, F. Tang, EFI Chicago, Tuesday 6 th Nov 2007 INTRODUCTION In the context of picosecond timing, analog detector pulse sampling in the 10

More information

FPGA Design. Part I - Hardware Components. Thomas Lenzi

FPGA Design. Part I - Hardware Components. Thomas Lenzi FPGA Design Part I - Hardware Components Thomas Lenzi Approach We believe that having knowledge of the hardware components that compose an FPGA allow for better firmware design. Being able to visualise

More information

A Symmetric Differential Clock Generator for Bit-Serial Hardware

A Symmetric Differential Clock Generator for Bit-Serial Hardware A Symmetric Differential Clock Generator for Bit-Serial Hardware Mitchell J. Myjak and José G. Delgado-Frias School of Electrical Engineering and Computer Science Washington State University Pullman, WA,

More information

L12: Reconfigurable Logic Architectures

L12: Reconfigurable Logic Architectures L12: Reconfigurable Logic Architectures Acknowledgements: Materials in this lecture are courtesy of the following sources and are used with permission. Frank Honore Prof. Randy Katz (Unified Microelectronics

More information

Lecture 17: Introduction to Design For Testability (DFT) & Manufacturing Test

Lecture 17: Introduction to Design For Testability (DFT) & Manufacturing Test Lecture 17: Introduction to Design For Testability (DFT) & Manufacturing Test Mark McDermott Electrical and Computer Engineering The University of Texas at Austin Agenda Introduction to testing Logical

More information

THE USE OF forward error correction (FEC) in optical networks

THE USE OF forward error correction (FEC) in optical networks IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 52, NO. 8, AUGUST 2005 461 A High-Speed Low-Complexity Reed Solomon Decoder for Optical Communications Hanho Lee, Member, IEEE Abstract

More information

VLSI Test Technology and Reliability (ET4076)

VLSI Test Technology and Reliability (ET4076) VLSI Test Technology and Reliability (ET476) Lecture 9 (2) Built-In-Self Test (Chapter 5) Said Hamdioui Computer Engineering Lab Delft University of Technology 29-2 Learning aims Describe the concept and

More information

PARALLEL PROCESSOR ARRAY FOR HIGH SPEED PATH PLANNING

PARALLEL PROCESSOR ARRAY FOR HIGH SPEED PATH PLANNING PARALLEL PROCESSOR ARRAY FOR HIGH SPEED PATH PLANNING S.E. Kemeny, T.J. Shaw, R.H. Nixon, E.R. Fossum Jet Propulsion LaboratoryKalifornia Institute of Technology 4800 Oak Grove Dr., Pasadena, CA 91 109

More information

Testability: Lecture 23 Design for Testability (DFT) Slide 1 of 43

Testability: Lecture 23 Design for Testability (DFT) Slide 1 of 43 Testability: Lecture 23 Design for Testability (DFT) Shaahin hi Hessabi Department of Computer Engineering Sharif University of Technology Adapted, with modifications, from lecture notes prepared p by

More information

Design of a Low Power Four-Bit Binary Counter Using Enhancement Type Mosfet

Design of a Low Power Four-Bit Binary Counter Using Enhancement Type Mosfet Design of a Low Power Four-Bit Binary Counter Using Enhancement Type Mosfet Praween Sinha Department of Electronics & Communication Engineering Maharaja Agrasen Institute Of Technology, Rohini sector -22,

More information

International Research Journal of Engineering and Technology (IRJET) e-issn: Volume: 03 Issue: 07 July p-issn:

International Research Journal of Engineering and Technology (IRJET) e-issn: Volume: 03 Issue: 07 July p-issn: IC Layout Design of Decoder Using Electrical VLSI System Design 1.UPENDRA CHARY CHOKKELLA Assistant Professor Electronics & Communication Department, Guru Nanak Institute Of Technology-Ibrahimpatnam (TS)-India

More information

V6118 EM MICROELECTRONIC - MARIN SA. 2, 4 and 8 Mutiplex LCD Driver

V6118 EM MICROELECTRONIC - MARIN SA. 2, 4 and 8 Mutiplex LCD Driver EM MICROELECTRONIC - MARIN SA 2, 4 and 8 Mutiplex LCD Driver Description The is a universal low multiplex LCD driver. The version 2 drives two ways multiplex (two blackplanes) LCD, the version 4, four

More information

Design and Simulation of a Digital CMOS Synchronous 4-bit Up-Counter with Set and Reset

Design and Simulation of a Digital CMOS Synchronous 4-bit Up-Counter with Set and Reset Design and Simulation of a Digital CMOS Synchronous 4-bit Up-Counter with Set and Reset Course Number: ECE 533 Spring 2013 University of Tennessee Knoxville Instructor: Dr. Syed Kamrul Islam Prepared by

More information

Chapter 5 Flip-Flops and Related Devices

Chapter 5 Flip-Flops and Related Devices Chapter 5 Flip-Flops and Related Devices Chapter 5 Objectives Selected areas covered in this chapter: Constructing/analyzing operation of latch flip-flops made from NAND or NOR gates. Differences of synchronous/asynchronous

More information

Flip Flop. S-R Flip Flop. Sequential Circuits. Block diagram. Prepared by:- Anwar Bari

Flip Flop. S-R Flip Flop. Sequential Circuits. Block diagram. Prepared by:- Anwar Bari Sequential Circuits The combinational circuit does not use any memory. Hence the previous state of input does not have any effect on the present state of the circuit. But sequential circuit has memory

More information

CHAPTER 6 DESIGN OF HIGH SPEED COUNTER USING PIPELINING

CHAPTER 6 DESIGN OF HIGH SPEED COUNTER USING PIPELINING 149 CHAPTER 6 DESIGN OF HIGH SPEED COUNTER USING PIPELINING 6.1 INTRODUCTION Counters act as important building blocks of fast arithmetic circuits used for frequency division, shifting operation, digital

More information

Power Reduction Techniques for a Spread Spectrum Based Correlator

Power Reduction Techniques for a Spread Spectrum Based Correlator Power Reduction Techniques for a Spread Spectrum Based Correlator David Garrett (garrett@virginia.edu) and Mircea Stan (mircea@virginia.edu) Center for Semicustom Integrated Systems University of Virginia

More information

Report on 4-bit Counter design Report- 1, 2. Report on D- Flipflop. Course project for ECE533

Report on 4-bit Counter design Report- 1, 2. Report on D- Flipflop. Course project for ECE533 Report on 4-bit Counter design Report- 1, 2. Report on D- Flipflop Course project for ECE533 I. Objective: REPORT-I The objective of this project is to design a 4-bit counter and implement it into a chip

More information

WINTER 14 EXAMINATION

WINTER 14 EXAMINATION Subject Code: 17320 WINTER 14 EXAMINATION Model Answer Important Instructions to examiners: 1) The answers should be examined by key words and not as word-to-word as given in the model answer scheme. 2)

More information

A Low Power Delay Buffer Using Gated Driver Tree

A Low Power Delay Buffer Using Gated Driver Tree IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) ISSN: 2319 4200, ISBN No. : 2319 4197 Volume 1, Issue 4 (Nov. - Dec. 2012), PP 26-30 A Low Power Delay Buffer Using Gated Driver Tree Kokkilagadda

More information

Cascadable 4-Bit Comparator

Cascadable 4-Bit Comparator EE 415 Project Report for Cascadable 4-Bit Comparator By William Dixon Mailbox 509 June 1, 2010 INTRODUCTION... 3 THE CASCADABLE 4-BIT COMPARATOR... 4 CONCEPT OF OPERATION... 4 LIMITATIONS... 5 POSSIBILITIES

More information

An FPGA Implementation of Shift Register Using Pulsed Latches

An FPGA Implementation of Shift Register Using Pulsed Latches An FPGA Implementation of Shift Register Using Pulsed Latches Shiny Panimalar.S, T.Nisha Priscilla, Associate Professor, Department of ECE, MAMCET, Tiruchirappalli, India PG Scholar, Department of ECE,

More information

Chapter 3 Evaluated Results of Conventional Pixel Circuit, Other Compensation Circuits and Proposed Pixel Circuits for Active Matrix Organic Light Emitting Diodes (AMOLEDs) -------------------------------------------------------------------------------------------------------

More information

Solution to Digital Logic )What is the magnitude comparator? Design a logic circuit for 4 bit magnitude comparator and explain it,

Solution to Digital Logic )What is the magnitude comparator? Design a logic circuit for 4 bit magnitude comparator and explain it, Solution to Digital Logic -2067 Solution to digital logic 2067 1.)What is the magnitude comparator? Design a logic circuit for 4 bit magnitude comparator and explain it, A Magnitude comparator is a combinational

More information

MODULAR DIGITAL ELECTRONICS TRAINING SYSTEM

MODULAR DIGITAL ELECTRONICS TRAINING SYSTEM MODULAR DIGITAL ELECTRONICS TRAINING SYSTEM MDETS UCTECH's Modular Digital Electronics Training System is a modular course covering the fundamentals, concepts, theory and applications of digital electronics.

More information

Design Project: Designing a Viterbi Decoder (PART I)

Design Project: Designing a Viterbi Decoder (PART I) Digital Integrated Circuits A Design Perspective 2/e Jan M. Rabaey, Anantha Chandrakasan, Borivoje Nikolić Chapters 6 and 11 Design Project: Designing a Viterbi Decoder (PART I) 1. Designing a Viterbi

More information

Technology Scaling Issues of an I DDQ Built-In Current Sensor

Technology Scaling Issues of an I DDQ Built-In Current Sensor Technology Scaling Issues of an I DDQ Built-In Current Sensor Bin Xue, D. M. H. Walker Dept. of Computer Science Texas A&M University College Station TX 77843-3112 Tel: (979) 862-4387 Email: {binxue, walker}@cs.tamu.edu

More information

FDTD_SPICE Analysis of EMI and SSO of LSI ICs Using a Full Chip Macro Model

FDTD_SPICE Analysis of EMI and SSO of LSI ICs Using a Full Chip Macro Model FDTD_SPICE Analysis of EMI and SSO of LSI ICs Using a Full Chip Macro Model Norio Matsui Applied Simulation Technology 2025 Gateway Place #318 San Jose, CA USA 95110 matsui@apsimtech.com Neven Orhanovic

More information

IMPLEMENTATION OF X-FACTOR CIRCUITRY IN DECOMPRESSOR ARCHITECTURE

IMPLEMENTATION OF X-FACTOR CIRCUITRY IN DECOMPRESSOR ARCHITECTURE IMPLEMENTATION OF X-FACTOR CIRCUITRY IN DECOMPRESSOR ARCHITECTURE SATHISHKUMAR.K #1, SARAVANAN.S #2, VIJAYSAI. R #3 School of Computing, M.Tech VLSI design, SASTRA University Thanjavur, Tamil Nadu, 613401,

More information

A dedicated data acquisition system for ion velocity measurements of laser produced plasmas

A dedicated data acquisition system for ion velocity measurements of laser produced plasmas A dedicated data acquisition system for ion velocity measurements of laser produced plasmas N Sreedhar, S Nigam, Y B S R Prasad, V K Senecha & C P Navathe Laser Plasma Division, Centre for Advanced Technology,

More information

CS 110 Computer Architecture. Finite State Machines, Functional Units. Instructor: Sören Schwertfeger.

CS 110 Computer Architecture. Finite State Machines, Functional Units. Instructor: Sören Schwertfeger. CS 110 Computer Architecture Finite State Machines, Functional Units Instructor: Sören Schwertfeger http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University

More information

A Modified Static Contention Free Single Phase Clocked Flip-flop Design for Low Power Applications

A Modified Static Contention Free Single Phase Clocked Flip-flop Design for Low Power Applications JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.8, NO.5, OCTOBER, 08 ISSN(Print) 598-657 https://doi.org/57/jsts.08.8.5.640 ISSN(Online) -4866 A Modified Static Contention Free Single Phase Clocked

More information

Designing for High Speed-Performance in CPLDs and FPGAs

Designing for High Speed-Performance in CPLDs and FPGAs Designing for High Speed-Performance in CPLDs and FPGAs Zeljko Zilic, Guy Lemieux, Kelvin Loveless, Stephen Brown, and Zvonko Vranesic Department of Electrical and Computer Engineering University of Toronto,

More information

MUHAMMAD NAEEM LATIF MCS 3 RD SEMESTER KHANEWAL

MUHAMMAD NAEEM LATIF MCS 3 RD SEMESTER KHANEWAL 1. A stage in a shift register consists of (a) a latch (b) a flip-flop (c) a byte of storage (d) from bits of storage 2. To serially shift a byte of data into a shift register, there must be (a) one click

More information

LUT Optimization for Memory Based Computation using Modified OMS Technique

LUT Optimization for Memory Based Computation using Modified OMS Technique LUT Optimization for Memory Based Computation using Modified OMS Technique Indrajit Shankar Acharya & Ruhan Bevi Dept. of ECE, SRM University, Chennai, India E-mail : indrajitac123@gmail.com, ruhanmady@yahoo.co.in

More information

Digital Correction for Multibit D/A Converters

Digital Correction for Multibit D/A Converters Digital Correction for Multibit D/A Converters José L. Ceballos 1, Jesper Steensgaard 2 and Gabor C. Temes 1 1 Dept. of Electrical Engineering and Computer Science, Oregon State University, Corvallis,

More information

MODULE 3. Combinational & Sequential logic

MODULE 3. Combinational & Sequential logic MODULE 3 Combinational & Sequential logic Combinational Logic Introduction Logic circuit may be classified into two categories. Combinational logic circuits 2. Sequential logic circuits A combinational

More information

Analogue Versus Digital [5 M]

Analogue Versus Digital [5 M] Q.1 a. Analogue Versus Digital [5 M] There are two basic ways of representing the numerical values of the various physical quantities with which we constantly deal in our day-to-day lives. One of the ways,

More information

Contents Slide Set 6. Introduction to Chapter 7 of the textbook. Outline of Slide Set 6. An outline of the first part of Chapter 7

Contents Slide Set 6. Introduction to Chapter 7 of the textbook. Outline of Slide Set 6. An outline of the first part of Chapter 7 CM 69 W4 Section Slide Set 6 slide 2/9 Contents Slide Set 6 for CM 69 Winter 24 Lecture Section Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary

More information

Integration of Virtual Instrumentation into a Compressed Electricity and Electronic Curriculum

Integration of Virtual Instrumentation into a Compressed Electricity and Electronic Curriculum Integration of Virtual Instrumentation into a Compressed Electricity and Electronic Curriculum Arif Sirinterlikci Ohio Northern University Background Ohio Northern University Technological Studies Department

More information

REDUCING DYNAMIC POWER BY PULSED LATCH AND MULTIPLE PULSE GENERATOR IN CLOCKTREE

REDUCING DYNAMIC POWER BY PULSED LATCH AND MULTIPLE PULSE GENERATOR IN CLOCKTREE Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 5, May 2014, pg.210

More information

Computer Systems Architecture

Computer Systems Architecture Computer Systems Architecture Fundamentals Of Digital Logic 1 Our Goal Understand Fundamentals and basics Concepts How computers work at the lowest level Avoid whenever possible Complexity Implementation

More information

MAHARASHTRA STATE BOARD OF TECHNICAL EDUCATION (Autonomous) (ISO/IEC Certified) WINTER 2018 EXAMINATION MODEL ANSWER

MAHARASHTRA STATE BOARD OF TECHNICAL EDUCATION (Autonomous) (ISO/IEC Certified) WINTER 2018 EXAMINATION MODEL ANSWER Important Instructions to examiners: 1) The answers should be examined by key words and not as word-to-word as given in themodel answer scheme. 2) The model answer and the answer written by candidate may

More information

A VLSI Implementation of an Analog Neural Network suited for Genetic Algorithms

A VLSI Implementation of an Analog Neural Network suited for Genetic Algorithms A VLSI Implementation of an Analog Neural Network suited for Genetic Algorithms Johannes Schemmel 1, Karlheinz Meier 1, and Felix Schürmann 1 Universität Heidelberg, Kirchhoff Institut für Physik, Schröderstr.

More information

Overview: Logic BIST

Overview: Logic BIST VLSI Design Verification and Testing Built-In Self-Test (BIST) - 2 Mohammad Tehranipoor Electrical and Computer Engineering University of Connecticut 23 April 2007 1 Overview: Logic BIST Motivation Built-in

More information

Efficient Architecture for Flexible Prescaler Using Multimodulo Prescaler

Efficient Architecture for Flexible Prescaler Using Multimodulo Prescaler Efficient Architecture for Flexible Using Multimodulo G SWETHA, S YUVARAJ Abstract This paper, An Efficient Architecture for Flexible Using Multimodulo is an architecture which is designed from the proposed

More information

RX40_V1_0 Measurement Report F.Faccio

RX40_V1_0 Measurement Report F.Faccio RX40_V1_0 Measurement Report F.Faccio This document follows the previous report An 80Mbit/s Optical Receiver for the CMS digital optical link, dating back to January 2000 and concerning the first prototype

More information

Performance Driven Reliable Link Design for Network on Chips

Performance Driven Reliable Link Design for Network on Chips Performance Driven Reliable Link Design for Network on Chips Rutuparna Tamhankar Srinivasan Murali Prof. Giovanni De Micheli Stanford University Outline Introduction Objective Logic design and implementation

More information

Laboratory 1 - Introduction to Digital Electronics and Lab Equipment (Logic Analyzers, Digital Oscilloscope, and FPGA-based Labkit)

Laboratory 1 - Introduction to Digital Electronics and Lab Equipment (Logic Analyzers, Digital Oscilloscope, and FPGA-based Labkit) Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6. - Introductory Digital Systems Laboratory (Spring 006) Laboratory - Introduction to Digital Electronics

More information

Field Programmable Gate Arrays (FPGAs)

Field Programmable Gate Arrays (FPGAs) Field Programmable Gate Arrays (FPGAs) Introduction Simulations and prototyping have been a very important part of the electronics industry since a very long time now. Before heading in for the actual

More information

System Quality Indicators

System Quality Indicators Chapter 2 System Quality Indicators The integration of systems on a chip, has led to a revolution in the electronic industry. Large, complex system functions can be integrated in a single IC, paving the

More information

nmos transistor Basics of VLSI Design and Test Solution: CMOS pmos transistor CMOS Inverter First-Order DC Analysis CMOS Inverter: Transient Response

nmos transistor Basics of VLSI Design and Test Solution: CMOS pmos transistor CMOS Inverter First-Order DC Analysis CMOS Inverter: Transient Response nmos transistor asics of VLSI Design and Test If the gate is high, the switch is on If the gate is low, the switch is off Mohammad Tehranipoor Drain ECE495/695: Introduction to Hardware Security & Trust

More information

ADVANCES in semiconductor technology are contributing

ADVANCES in semiconductor technology are contributing 292 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 14, NO. 3, MARCH 2006 Test Infrastructure Design for Mixed-Signal SOCs With Wrapped Analog Cores Anuja Sehgal, Student Member,

More information