LUT Optimization for Distributed Arithmetic-Based Block Least Mean Square Adaptive Filter

LUT Optimization for Distributed Arithmetic-Based Block Least Mean Square Adaptive Filter Abstract: In this paper, we analyze the contents of lookup tables (LUTs) of distributed arithmetic (DA)- based block least mean square (BLMS) adaptive filter (ADF) and based on that we propose intraiteration LUT sharing to reduce its hardware resources, energy consumption, and iteration period. The proposed LUT optimization scheme offers a saving of 60% LUT content for block size 8 and still higher saving for larger block sizes over the conventional design approach. The proposed architecture of this paper analysis the logic size, area and power consumption using Xilinx 14.2. Enhancement of the project: Existing System: Distributed arithmetic (DA)-based design approach has been proposed to derive low-complexity hardware structures for ADFs. The DA-based ADF uses lookup tables (LUTs) for the calculation of filter output and weight-increment terms, which constitute most of its hardware resources. The DA-based LMS ADF structure of uses two separate LUTs for the calculation of filter output and weight-increment terms. Few design schemes have been suggested in recent past for efficient realization of LMS ADF in FPGA. A DA-based pipelined structure is proposed for the realization of delayed LMS ADF with low adaptation delay. Subsequently, another DA-based design has been proposed for LMS ADFs, where a single LUT is used to perform both filtering and weight-updating and a parallel LUTupdate method is used to reduce LUT-update time. Carry-save accumulation is used to further reduce the iteration period of the DA-based LMS structure. A few DA-based designs have also been proposed for the FPGA realization of BLMS ADF. We have proposed a DA structure for BLMS ADF. Although many DA-based designs have been suggested for LMS- and BLMSbased ADF, we do not find any LUT optimization scheme in the literature specific to BLMS DA-LUT. In this paper, we have made an analysis of intra-iteration LUT contents of DA-based

BLMS ADF design to find the redundant LUT words which could be shared to minimize hardware resources, the number of LUT accesses, energy consumption and iteration period. Disadvantages: The LUT size is large LUT-update is complex Proposed System: Allred et al. have identified the LUT redundancy corresponding to successive iterations of the DA-based LMS ADF, and based on that the half of the auxiliary LUT contents is updated. No LUT optimization scheme, however, has been proposed to take advantage of redundant LUT values in the DA-LMS computation. We observe that, in DA-based LMS ADF, the redundant LUT values belong to different processing cycles and they need to be stored in LUT or outside LUT, which consumes the same amount of resource. Therefore, the redundant LUT values of DA-based LMS do not offer LUT optimization except LUT words to be updated. However, in the case of DA-based BLMS ADF, the redundant LUT values of L successive iterations are created within a processing cycle, which allow the possibility of LUT optimization, where L is the block size. Conventionally, 16 NP LUT words are required to implement NP LUTs of the LU matrix. For filter length N = 16, 256 LUT words are required to implement the LU matrix for L = 4. The contents of LU matrix of BLMS filter for block size L = 4 are shown in Fig. 1. The LUT content is represented by function E(.), which enumerates a sum of 16 possible combination of an input vector.

Fig. 1. LUT content of the LU matrix of block size L = 4 for four consecutive iterations [kth, (k + 1)th, (k + 2)th, and (k + 3)th]. Light gray color LUTs of successive iteration with identical content. The input argument s i,0 k for 0 i 3 of the first column of LU is defined for the kth iteration input-block {x(n) x(n 3)}, where n = k L. {x(n) x(n 3)}: input sequence {x(n), x(n 1), x(n 2), x(n 3)}. Gray color: succeeding LUTs with overlapped input vectors. Intra-iteration LUT Sharing The LUT content depends on the argument (s ij k,p) of the LUT enumeration function E which does not change during an iteration. We analyze the arguments (s ij k,p) corresponding to one column of the LU matrix to find the redundant values in the LUTs of one column of LU. Inter-iteration LUT Reuse As shown in Fig. 1, The LUT contents of the first (M 1) columns of LUs of any given iteration can be reused by the last (M 1) columns of LUs during the next iteration, which need not be updated. Proposed Design Strategy The entire LUT content needs to be available in the same cycle for the sharing of LUT words. The conventional RAM-based LUTs are not suitable for LUT sharing, since in any given cycle, they allow access to only one (or a few in the case of multiported RAM) of the stored LUT values. A register-based LUT (REG-LUT) could be used instead for the proposed DA-based design. Based on these facts, we have arrived at the following design strategy to derive an area-delaypower efficient structure for the DA-based BLMS ADF. 1) The register-based shared LUT is used instead of the conventional RAM-based LUT to exploit intra-iteration LUT sharing. 2) Based on the inter-iteration LUT reuse provision of BLMS ADF only one column out of (N/L) columns of the LU matrix is updated in every iteration. 3) A full-parallel design for LUT-update unit is used to generate update values of one LU column to update its contents in one cycle.

The proposed structure is similar to the structure of at block level. However, the internal structures of LUT-update block and processing element (PE) of the DA module are different than that of due to shared LUTs used in the proposed design. The structure of the DA module of the proposed structure is shown in Fig. 2. Each PE of the DAmodule uses REG-LUTs instead of RAM-LUTs as in the case to make the use of the LUT sharing property. It requires only (16L 25) registers instead of 16P L RAM words as required.the LUT-update unit of the DA-module of the proposed structure computes a set of (16L 25) values to update LUTs of a PE in one cycle against 16 cycles required.

Fig. 2. Structure of DA module of the proposed DA BLMS ADF of filter length N and block size L, where N = M L. Advantages: reduce the LUT-size reduce LUT-update complexity Software implementation: Modelsim Xilinx ISE