IN 1968, Anderson [6] proposed a memory structure named

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL 16, NO 2, MARCH 2005 293 Encoding Strategy for Maximum Noise Tolerance Bidirectional Associative Memory Dan Shen Jose B Cruz, Jr, Life Fellow, IEEE Abstract In this paper, the basic bidirectional associative memory (BAM) is extended by choosing weights in the correlation matrix, for a given set of training pairs, which result in a maximum noise tolerance set for BAM We prove that for a given set of training pairs, the maximum noise tolerance set is the largest, in the sense that this optimized BAM will recall the correct training pair if any input pattern is within the maximum noise tolerance set at least one pattern outside the maximum noise tolerance set by one Hamming distance will not converge to the correct training pair This maximum tolerance set is the union of the maximum basins of attraction A stard genetic algorithm (GA) is used to calculate the weights to maximize the objective function which generates a maximum tolerance set for BAM Computer simulations are presented to illustrate the error correction fault tolerance properties of the optimized BAM Index Terms Bidirectional associative memory (BAM), energy well hyper-radius, neural network training, noise tolerance set I INTRODUCTION IN 1968, Anderson [6] proposed a memory structure named linear associative memory (LAM), which can be used in hetero-associative pattern recognition Since LAM is noise sensitive, optimal LAM was introduced by Wee [7] Kohonen [8], which extended the LAM by absorbing the noise Although good results can be obtained using these early approaches, many theoretical practical issues such as network stability storage capacity were still unresolved In 1988, Kosko [1] presented the theory of bidirectional associative memory (BAM) by generalizing the Hopfield network model As a class of artificial neural networks, BAMs provide massive parallelism, high error correction high fault tolerance ability However, to form a good BAM, a good encoding strategy was required This field has received extensive attention from researchers a substantial effort has been devoted to various learning rules Kosko [1] has provided a correlation learning strategy proved that the BAM process will converge after a finite number of interactions However, the correlation matrix used by Kosko cannot guarantee that the energy of any training pair is a local minimum That is, it can not guarantee recall of any training pair even for a very small set of training data During the following years, various encoding strategies learning rules were proposed to improve the capacity the performance of BAM In 1990, Wang et al [2] introduced two BAM encoding schemes to increase the recall performance with a trade off of more neurons These are multiple training methods, which guarantee the recall of all training pairs [3] In 1993 1994, Leung [9], [10] presented the enhanced householder encoding algorithm (EHCA), which was improved by Lenze [11] in 2001, to enlarge the capacity In 1995, Wang Don [12] introduced the exponential bidirectional associative memory (ebam), which uses an exponential encoding rule rather than the correlation scheme For other types of neural networks, there are good procedures for learning, training stability analysis in [13] [18] However, for the conventional BAM, the current methods have focused on the training set or capacity only The noisy neighbor pairs the noise tolerance set of BAM have been ignored In this paper, we are especially interested in the approach proposed by Wang et al [2], [3] exp the applicability of the BAM The principal contribution of this paper is the construction of an objective function whose maximum with respect to corresponds to the weight that results in the maximum noise tolerance set For a given set of training pairs, any noisy input pair within the tolerance set will converge to the correct training pair Some basic concepts of BAM are reviewed in Section II Then, the multiple training concept is extended in Section III with the optimization-based encoding strategy for constructing the correlation matrix Four lemmas two theorems about the new encoding rule are proved in the same section These provide the foundation for constructing the maximum noise tolerance set We present a numerical example in Section IV to illustrate the effectiveness of the extended BAM In this example, a stard GA is used to solve the nonlinear optimal problem obtain the optimum training weights Finally, we draw a conclusion in Section V Manuscript received June 1, 2003; revised November 4, 2003 This work was supported by the Defense Advanced Research Project Agency (DARPA) under Contract F33615-01-C3151 issued by the Air Force Research Laboratory/Air Vehicles Directorate The views conclusions contained herein are those of the authors should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the DARPA or AFRL The authors are with the Department of Electrical Computer Engineering, The Ohio State University, Columbus, OH 43210 USA (e-mail: shen100@osuedu; jbcruz@ieeeorg) Digital Object Identifier 101109/TNN2004841793 II BAM BAM is a two-layer hetero-associative feedback neural network model first introduced by Kosko [1] As shown in Fig 1, the input layer includes binary valued neurons the output layer comprises binary valued components Now we have BAM can be denoted as a 1045-9227/$2000 2005 IEEE

294 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL 16, NO 2, MARCH 2005 Fig 1 Structure of BAM bi-directional mapping in vector space The training pairs can be stored in the correlation matrix as follows: where are the bipolar mode of, respectively, ie Kosko [1] Haines et al [4] have proved that after a finite number of iterations, converges to a local minimum, where the corresponding pair is a stable point McEliece et al [5] has shown that if the training pairs are even coded ( with probability 05) -dimensional, the storage capacity of the homogeneous BAM is That means, if even-coded stable states are chosen uniformly at rom, the maximum value of in order that most of the original vectors are accurately recalled is For the nonhomogeneous BAM, Haines Hecht-Nielsen [4] have pointed out that the possible number of the stable states is between 1 However, since these stable states are chosen in a rigid geometrical procedure, the storage capacity of the nonhomogeneous BAM is less than the maximum number Haines Hecht-Nielsen [4] also have shown that for same dimensional uniformly romly chosen training pairs with exactly entries equal to entries equal to,if, then a nonhomogeneous BAM can be constructed so that approximately 98% of these chosen pairs can be stable states If inputs then are orthogonal to each other, ie III ENCODING STRATEGY FOR BAM WITH MAXIMUM NOISE TOLERANCE SET In this new enhanced model, we start with a weighted learning rule of BAM similar to the multiple training strategy in [3] For a given set of training pairs, the weighted correlation matrix is (1) where To obtain higher accuracy for associative memory retrieve one of the nearest training inputs, the output can be fed back to BAM Starting with a pair, determine a sequence, until it finally converges to an equilibrium point If BAM converges for every training pair, is said to be bidirectional stable The sequence can be obtained as follows: are the lengths of the input output patterns, respectively is the vector of training weights In [3], necessary sufficient conditions are derived for choosing such that each training pair can be recalled correctly The energy of a training pair is defined as (2) where is the th element of the vector are two thresholds for the th element of, respectively If, then this kind of BAM is called homogeneous Others are called nonhomogeneous BAM For each pair, the Lyapunov or energy function is defined as If the energy of one training pair is lower than all its neighbors with one Hamming distance away from it, then the training pair can be recalled correctly The neighbor pairs with (, Integer set) Hamming distance away from a pair is defined as where is the Hamming distance between layers, is the Hamming distance between layers

SHEN AND CRUZ: ENCODING STRATEGY FOR MAXIMUM NOISE TOLERANCE BAM 295 Lemma 1: If a training weight vector satisfies 3) an upper bound of the energy well hyper-radius is (3) where differs from only in the kth Then, (positive integer set), such that any pair, has higher energy than any pair Proof: Wang et al [2] has proved that if a training weight vector satisfies condition (3), then all training pairs can be recalled correctly Since a training pair can be recalled correctly if only if is a local minimum on the energy surface, any pair has higher energy than Proof: From Lemma 1 Definition 1, since satisfies (3), its associated energy well hyper-radius 1) Kosko [1] has pointed out that when a pair is an input to a BAM, the network quickly evolves to a system energy local minimum For any input pair in, there is a high energy hill around it So it is guaranteed that BAM evolves to some pair Since is the only system energy local minimum, any input pair in the set converges to the training pair 2) For any,if, then there is at least one pair From conclusion 1) which we have just proved, converges to the training pair It implies that which is inconsistent with the condition that So, for any such that, 3) From conclusion 2) that for any any,,wehave,we obtain So an upper bound for the energy well hyper-radius is any pair So, at least such that any pair, has higher energy than any pair Definition 1: For a BAM satisfying condition (3), we define the maximum as the energy well hyper-radius which satisfies the following: 1) ; 2) any pair, has higher energy than any pair 3) at least one pair has energy lower than or equal to that of at least one pair Lemma 2: Given a desired training pair set, a weight vector satisfying condition (3), for the associated energy well hyper-radius,ifwedefine for each,, then: 1) any input pair in the set converges to the training pair ; 2) for any such that,wehave ; ; Definition 2: For a given training pair set with a weight vector the associated energy well hyper-radius,wedefine as the noise tolerance set of BAM Any pair in input to BAM converges to the correct training pair We want to find the optimal training weight vector which can generate a correlation matrix with the maximum energy well hyper-radius the optimum noise tolerance set any In [3], Wang et al just considered neighbors with one Hamming distance, corresponding to, Their method does not provide any information for determining a noise tolerance set For each training pair in a training set formed from the training set by (1), we define the energy of any neighbor where (4)

296 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL 16, NO 2, MARCH 2005 are the position indices of the bits with the complementary values (in bipolar mode, the complementary value of ( ) is ( ); in binary mode, the complementary value of 1(0) is 0 (1)) for the input pattern if (5) has a similar meaning for the output pat- while tern We also define Then, for a fixed weight vector objective function is defined as if (6) (7), the where is a weighted sum of the energy difference between any pair, any pair defined as (9), shown at the bottom of the page, where means all combinations of which satisfies condition (5) (6), respectively is defined as If, then (8) where in the operation st for the condition The series, respectively can be generated by the following formula: (11) (12) where for any, It is obvious that series is strictly decreasing Theorem 1 (Maximum Noise Tolerance Theorem): Given a set of training pairs at least one satisfying the condition of Lemma 1, if denotes the that maximizes, where is given in (4) (12) (13) then the following 1) The BAM has the maximum energy well hyper-radius, where uniquely satisfies (14) If, then If, then If, then (10) 2) any, ie, for any, there is at least one pair such that if it is input to BAM, the output layer will not converge to the correct training pair Proof: We divide the proof into three parts The first one is to show that uniquely satisfies inequality (14) The second is to prove that is the maximum energy well hyper-radius The last one is to show that any First, given a training weight vector energy well hyper-radius, depends on the training pair set Since for any pair, we put a penalty value on the objective function if has energy lower than or equal to that of any neighbor pair is a strictly decreasing series, the objective function (9)

SHEN AND CRUZ: ENCODING STRATEGY FOR MAXIMUM NOISE TOLERANCE BAM 297 takes the largest value when only one neighbor pair one pair has energy lower than or equal to that of On the other h, when any neighbor pair, has energy lower than or equal to that of any pair, takes the lowest value So, inequality (14) holds If, since, inequality (14) still holds It can be shown by contradiction that only one unique satisfies the inequality (14) If there is, that satisfies inequality (14), then This is inconsistent with the fact that Hence, inequality (14) is satisfied by a unique Second, if, then is the maximum energy well hyper-radius If, then the conclusion that is the maximum energy well hyper-radius can be proved by contradiction as follows If there is a pair, with the energy well hyperradius,, then while From the condition,wehave or If, from the right part of (14) (15) so This is inconsistent with the fact that If, the right part of (15) Then we obtain which is inconsistent with (13) that defines as the optimal solution So is the maximum energy well hyper-radius Third, since is the maximum energy well hyper-radius, for any, there is at least one neighbor pair, which has energy lower than or equal to that of one pair Then if this neighbor pair is input to BAM, the output pair will not be the correct training pair Since,,we can obtain that So, there is at least one input pair, such that if it is input to BAM, the network does not converge to the correct training pair Hence, the optimum tolerance set is

298 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL 16, NO 2, MARCH 2005 Remarks: The optimum noise tolerance set will be called the maximum noise tolerance set Note that is the maximum basin of attraction for the training pair That is, the optimum noise tolerance set is the union of the maximum basins of attraction It is for a fixed training pair set It is possible to find some method, such as the dummy augmentation in [2] to change the set of training pairs to one with increased separation between the training pairs but with the same information content Intuitively, this can increase the probability of finding a larger maximum noise tolerance set due to an increased energy well hyper-radius upper bound There are three types of neighbors for BAM: 1) the ones, whose output pairs converge to the correct training pairs; 2) the ones, whose deviations are beyond the upper bound whose output pairs will not converge to correct training pairs; 3) others that may or may not be recalled correctly Definition 3: For the fixed upper bound,wedefine as a tentative value of Definition 4: In the Maximum Noise Tolerance Theorem, if we replace with, we obtain instead of Since is not unique, we denote the set by Lemma 3: if only if Proof: From the proof of the Maximum Noise Tolerance Theorem, we know that if then any pair, has higher energy than any pair Thus, Since is the maximum energy well hyper-radius, On the other h, if, away from any training pair will be recalled correctly using the training weight Then, any patterns with less than Hamming distance away from any training pair will be recalled correctly using the training weight It means that So, Then To prove, we consider all four cases 1) If, then, by Lemma 4, 2) If, then, by Lemma 3, By Lemma 4, So 3) If, then, by Lemma 4, By Lemma 3, So this case is not possible 4) If, by Lemma 3, So From the above, we can conclude that Remarks: Theorem 2 is very useful in saving computation time Based on the fact that the smaller the, the shorter the computation time, we can pick a smaller tentative to calculate If we conclude using Lemma 3, then we can increase by 1 calculate until by Lemma 4 IV COMPUTER SIMULATIONS A numerical example taken from [2] is given in this section to evaluate the performance of the extended BAM with optimized training weights Suppose one wants to distinguish three pattern pairs shown in Fig 2 it means that any pair, has higher energy than any pair Lemma 4: If, then Then Proof: By Lemma 3,if, then Consider the fact, we conclude that So, by definition 1, this is exactly Theorem 2: For any, Proof: By definition 4, For, any patterns with less than Hamming distance Since 26 is a relatively big number, we use the methodology presented in Theorem 2 We pick 1 as the first tentative value of In this example, to find the optimum training weights, the objective function defined in (8) is used as the fitness function of the genetic algorithm (GA) The advantage of the algorithm proved in Theorem 2 is shown in Fig 3 Wehave used 10 000 romly generated samples to test the optimized BAM All training pairs have been recalled correctly all noisy input pairs with less than 4 Hamming distance away from the training pairs have converged to the correct training pair We also find a pattern with 5 Hamming distance away from the training pair 1, which cannot be recalled correctly, as shown in Fig 4

SHEN AND CRUZ: ENCODING STRATEGY FOR MAXIMUM NOISE TOLERANCE BAM 299 Fig 5 Pattern with 4 Hamming distance away from the training pair 2 (upper) cannot be recalled by methodology in [2] [3] Fig 2 Three training pairs Fig 6 Same pattern can be recalled by the optimized BAM V CONCLUSION Fig 3 Maximum F versus computation time Fig 4 Input pattern with 5 Hamming distance away from the training pair 1 (upper) versus wrong result (lower) We extended the Basic BAM, using an optimized weight for the correlation matrix For a given set of training pairs, we determined the weights for the training pairs in the BAM correlation matrix that result in the maximum noise tolerance set Any noisy input pair within the tolerance set will converge to the correct training pair We proved that for a given set of training pairs, the maximum noise tolerance set is the largest in the sense that at least one pair, with Hamming distance one larger than the hyper radius associated with the optimum noise tolerance set, will not converge to the correct training pair A stard GA was used to calculate the weights to maximize the objective function For BAM applications, the speed of encoding is relatively less important than that of the decoding because the encoding can be calculated offline However, if adaptive encoding is needed to apply to some new desired pairs in real time simulation, the training time should be as short as possible In the example of this paper, a stard GA algorithm was used This GA worked well but performed relatively inefficiently, as calculation times were quite long with many generations fitness values needed to find the optimal solution Since this calculation is offline, this limitation is not serious We also compared our optimized BAM with the methodology in [2] [3] The simulation results in Figs 5 6 show that our method can find the maximum noise tolerance set, which is not guaranteed by the algorithms in [2] [3] ACKNOWLEDGMENT The authors acknowledge helpful discussions with Dr G Chen

300 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL 16, NO 2, MARCH 2005 REFERENCES [1] B Kosko, Bidirectional associative memories, IEEE Trans Syst, Man, Cybern, vol 18, no 1, pp 49 60, Jan 1988 [2] Y-F Wang, J B Cruz Jr, J H Mulligan Jr, Two coding strategies for bidirectional associative memory, IEEE Trans Neural Netw, vol 1, no 1, pp 81 92, Jan 1990 [3], Guaranteed recall of all training pairs for bidirectional associative memory, IEEE Trans Neural Netw, vol 2, no 6, pp 559 567, Nov 1991 [4] K Haines R Hecht-Nielsen, A BAM with increased information storage capacity, in Proc IEEE ICNN 88, vol 1, Jul, pp 181 190 [5] R McEliece, E Posner, E Rodemich, S Venkatesh, The capacity of the Hopfield associative memory, IEEE Trans Inf Theory, vol IT 33, no 7, pp 461 482, Jul 1987 [6] J Anderson, A memory storage model utilizing spatial correlation functions, Kybernetik, vol 5, no 3, pp 113 119, 1968 [7] W G Wee, Generalized inverse approach to adaptive multiclass pattern classification, IEEE Trans Comput, vol C-17, no 12, pp 1157 1164, Dec 1969 [8] T Kohonen M Ruohonen, Representation of associative pairs by matrix operations, IEEE Trans Comput, vol C 22, no 7, pp 701 702, Jul 1973 [9] C S Leung, Encoding method for bidirectional associative memory using projection on convex sets, IEEE Trans Neural Netw, vol 4, no 5, pp 879 881, Sep 1993 [10], Optimum learning for bidirectional associative memory in the sense of capacity, IEEE Trans Syst, Man, Cybern, vol 24, no 5, pp 791 796, May 1994 [11] B Lenze, Improving Leung s bidirectional learning rule for associative memories, IEEE Trans Neural Netw, vol 12, no 5, pp 1222 1226, Sep 2001 [12] C-C Wang H-S Don, An analysis of high-capacity discrete exponential BAM, IEEE Trans Neural Netw, vol 6, no 2, pp 492 496, Mar 1995 [13] P Liu H Li, Efficient learning algorithms for three-layer regular feedforward fuzzy neural networks, IEEE Trans Neural Netw, vol 15, no 3, pp 545 558, May 2004 [14] G-B Huang, Learning capability storage capacity of two-hiddenlayer feedforward networks, IEEE Trans Neural Netw, vol 14, no 2, pp 274 281, Mar 2003 [15] D Chakraborty N R Pal, A novel training scheme for multilayered perceptrons to realize proper generalization incremental learning, IEEE Trans Neural Netw, vol 14, no 1, pp 1 14, Jan 2003 [16] P K H Phua D Ming, Parallel nonlinear optimization techniques for training neural networks, IEEE Trans Neural Netw, vol 14, no 6, pp 1460 1468, Nov 2003 [17] J-D Hwang F-H Hsiao, Stability analysis of neural-network interconnected systems, IEEE Trans Neural Netw, vol 14, no 1, pp 201 208, Jan 2003 [18] X Liao K-W Wong, Robust stability of interval bidirectional associative memory neural network with time delays, IEEE Trans Syst, Man, Cybern B, Cybern, vol 34, no 2, pp 1142 1154, Apr 2004 Dan Shen received the BS degree in automation from Tsinghua University, Beijing, China, in 1998 the MS degree in electrical engineering from The Ohio State University (OSU), Columbus, in 2003 Currently, he is working toward the PhD degree at OSU From 1998 to 2000, he was with Softbrain Software Co, Ltd, Beijing, China, as as a Software Engineer He is currently a Graduate Research Associate in the Department of Electrical Computer Engineering at OSU His research interests include game theory its applications, optimal control, adaptive control Jose B Cruz, Jr (M 57 SM 61 F 68 LF 95) received the BS degree in electrical engineering (summa cum laude) from the University of the Philippines (UP) in 1953, the SM degree in electrical engineering from the Massachusetts Institute of Technology (MIT), Cambridge, in 1956, the PhD degree in electrical engineering from the University of Illinois, Urbana-Champaign, in 1959 He is currently a Distinguished Professor of Engineering Professor of Electrical Computer Engineering at The Ohio State University (OSU), Columbus Previously, he served as Dean of the College of Engineering at OSU from 1992 to 1997, Professor of electrical computer engineering at the University of California, Irvine (UCI), from 1986 to 1992, at the University of Illinois from 1965 to 1986 He was a Visiting Professor at MIT Harvard University, Cambridge, in 1973 Visiting Associate Professor at the University of California, Berkeley, from 1964 to 1965 He served as Instructor at UP in 1953 1954, Research Assistant at MIT from 1954 to 1956 He is the author or coauthor of six books, 21 chapters in research books, numerous articles in research journals refereed conference proceedings Dr Cruz was elected as a member of the National Academy of Engineering (NAE) in 1980 In 2003, he was elected a Corresponding Member of the National Academy of Science Technology (Philippines) He is also a Fellow of the American Association for the Advancement of Science (AAAS), elected 1989, a Fellow of the American Society for Engineering Education (ASEE), elected in 2004 He received the Curtis W McGraw Research Award of ASEE in 1972 the Halliburton Engineering Education Leadership Award in 1981 He is a Distinguished Member of the IEEE Control Systems Society received the IEEE Centennial Medal in 1984, the IEEE Richard M Emberson Award in 1989, the ASEE Centennial Medal in 1993, the Richard E Bellman Control Heritage Award, American Automatic Control Council (AACC), 1994 In addition to membership in NAE, ASEE, AAAS, he is a Member of the Philippine American Academy for Science Engineering (Founding member, 1980, President 1982, Chairman of the Board, 1998 2000), Philippine Engineers Scientists Organization (PESO), National Society of Professional Engineers, Sigma Xi, Phi Kappa Phi, Eta Kappa Nu He served as a Member of the Board of Examiners for Professional Engineers for the State of Illinois, from 1984 to 1986 He served on various professional society boards editorial boards, he served as an officer of professional societies, including IEEE, where he was President of the Control Systems Society in 1979, Editor of the IEEE TRANSACTIONS ON AUTOMATIC CONTROL, a Member of the Board of Directors from 1980 to 1985, Vice President for Technical Activities in 1982 1983, Vice President for Publication Activities in 1984 1985 Currently, he serves as Chair (2004 2005) of the Engineering Section of the American Association for the Advancement of Science (AAAS)