Design of an Area-Efficient Interpolated FIR Filter Based on LUT Partitioning

Similar documents
ALONG with the progressive device scaling, semiconductor

Implementation of Memory Based Multiplication Using Micro wind Software

LUT OPTIMIZATION USING COMBINED APC-OMS TECHNIQUE

Design and Implementation of Partial Reconfigurable Fir Filter Using Distributed Arithmetic Architecture

Reconfigurable FPGA Implementation of FIR Filter using Modified DA Method

International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013

A Novel Architecture of LUT Design Optimization for DSP Applications

OF AN ADVANCED LUT METHODOLOGY BASED FIR FILTER DESIGN PROCESS

Keywords Xilinx ISE, LUT, FIR System, SDR, Spectrum- Sensing, FPGA, Memory- optimization, A-OMS LUT.

THE USE OF forward error correction (FEC) in optical networks

LUT Optimization for Memory Based Computation using Modified OMS Technique

Research Article Design and Implementation of High Speed and Low Power Modified Square Root Carry Select Adder (MSQRTCSLA)

Design of Memory Based Implementation Using LUT Multiplier

Distributed Arithmetic Unit Design for Fir Filter

Modified Reconfigurable Fir Filter Design Using Look up Table

An optimized implementation of 128 bit carry select adder using binary to excess-one converter for delay reduction and area efficiency

Design and Implementation of High Speed 256-Bit Modified Square Root Carry Select Adder

An Lut Adaptive Filter Using DA

Guidance For Scrambling Data Signals For EMC Compliance

An Efficient Reduction of Area in Multistandard Transform Core

A Low Power Delay Buffer Using Gated Driver Tree

FPGA Implementation of DA Algritm for Fir Filter

OMS Based LUT Optimization

128 BIT CARRY SELECT ADDER USING BINARY TO EXCESS-ONE CONVERTER FOR DELAY REDUCTION AND AREA EFFICIENCY

Memory efficient Distributed architecture LUT Design using Unified Architecture

Designing Fir Filter Using Modified Look up Table Multiplier

An MFA Binary Counter for Low Power Application

Implementation and Analysis of Area Efficient Architectures for CSLA by using CLA

Power Reduction Techniques for a Spread Spectrum Based Correlator

High Performance Carry Chains for FPGAs

International Journal of Scientific & Engineering Research, Volume 5, Issue 9, September ISSN

Reduction of Clock Power in Sequential Circuits Using Multi-Bit Flip-Flops

Implementation of Low Power and Area Efficient Carry Select Adder

Design And Implimentation Of Modified Sqrt Carry Select Adder On FPGA

A Fast Constant Coefficient Multiplier for the XC6200

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath

Abstract 1. INTRODUCTION. Cheekati Sirisha, IJECS Volume 05 Issue 10 Oct., 2016 Page No Page 18532

Implementation of Area Efficient Memory-Based FIR Digital Filter Using LUT-Multiplier

Design on CIC interpolator in Model Simulator

Design and Implementation of LUT Optimization DSP Techniques

International Journal of Engineering Research-Online A Peer Reviewed International Journal

A Parallel Area Delay Efficient Interpolation Filter Architecture

LUT Design Using OMS Technique for Memory Based Realization of FIR Filter

DDC and DUC Filters in SDR platforms

DESIGN OF LOW POWER AND HIGH SPEED BEC 2248 EFFICIENT NOVEL CARRY SELECT ADDER

Efficient Method for Look-Up-Table Design in Memory Based Fir Filters

Design of Fault Coverage Test Pattern Generator Using LFSR

Random Access Scan. Veeraraghavan Ramamurthy Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL

FPGA Hardware Resource Specific Optimal Design for FIR Filters

AN EFFICIENT LOW POWER DESIGN FOR ASYNCHRONOUS DATA SAMPLING IN DOUBLE EDGE TRIGGERED FLIP-FLOPS

Interframe Bus Encoding Technique for Low Power Video Compression

REDUCING DYNAMIC POWER BY PULSED LATCH AND MULTIPLE PULSE GENERATOR IN CLOCKTREE

VLSI Technology used in Auto-Scan Delay Testing Design For Bench Mark Circuits

Fault Detection And Correction Using MLD For Memory Applications

The main design objective in adder design are area, speed and power. Carry Select Adder (CSLA) is one of the fastest

Bit Swapping LFSR and its Application to Fault Detection and Diagnosis Using FPGA

Optimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015

An FPGA Implementation of Shift Register Using Pulsed Latches

Adding Analog and Mixed Signal Concerns to a Digital VLSI Course

12-bit Wallace Tree Multiplier CMPEN 411 Final Report Matthew Poremba 5/1/2009

An Improved Recursive and Non-recursive Comb Filter for DSP Applications

An Efficient High Speed Wallace Tree Multiplier

Optimization of memory based multiplication for LUT

A VLSI Architecture for Variable Block Size Video Motion Estimation

Area and Speed Efficient Implementation of Symmetric FIR Digital Filter through Reduced Parallel LUT Decomposed DA Approach

Multirate Digital Signal Processing

Power Efficient Design of Sequential Circuits using OBSC and RTPG Integration

Gated Driver Tree Based Power Optimized Multi-Bit Flip-Flops

ISSN:

Low Power VLSI CMOS Design An Image Processing Chip for RGB to HSI Conversion

Research Article Low Power 256-bit Modified Carry Select Adder

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS)

White Paper Versatile Digital QAM Modulator

Peak Dynamic Power Estimation of FPGA-mapped Digital Designs

Clock Gating Aware Low Power ALU Design and Implementation on FPGA

An Efficient 64-Bit Carry Select Adder With Less Delay And Reduced Area Application

Efficient Parallelization of Polyphase Arbitrary Resampling FIR Filters for High-Speed Applications

L12: Reconfigurable Logic Architectures

Asynchronous IC Interconnect Network Design and Implementation Using a Standard ASIC Flow

Implementation of High Speed Adder using DLATCH

PERFORMANCE ANALYSIS OF AN EFFICIENT PULSE-TRIGGERED FLIP FLOPS FOR ULTRA LOW POWER APPLICATIONS

AN OPTIMIZED IMPLEMENTATION OF MULTI- BIT FLIP-FLOP USING VERILOG

EEC 116 Fall 2011 Lab #5: Pipelined 32b Adder

N.S.N College of Engineering and Technology, Karur

International Journal Of Global Innovations -Vol.6, Issue.I Paper Id: SP-V6-I1-P11 ISSN Online:

Figure.1 Clock signal II. SYSTEM ANALYSIS

LFSR Counter Implementation in CMOS VLSI

EFFICIENT DESIGN OF SHIFT REGISTER FOR AREA AND POWER REDUCTION USING PULSED LATCH

A Symmetric Differential Clock Generator for Bit-Serial Hardware

Using Embedded Dynamic Random Access Memory to Reduce Energy Consumption of Magnetic Recording Read Channel

POWER OPTIMIZED CLOCK GATED ALU FOR LOW POWER PROCESSOR DESIGN

Low Power Area Efficient Parallel Counter Architecture

Adaptive Fir Filter with Optimised Area and Power using Modified Inner-Product Block

Retiming Sequential Circuits for Low Power

Design & Simulation of 128x Interpolator Filter

EN2911X: Reconfigurable Computing Topic 01: Programmable Logic. Prof. Sherief Reda School of Engineering, Brown University Fall 2014

Dynamically Reconfigurable FIR Filter Architectures with Fast Reconfiguration

Low Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Combining Dual-Supply, Dual-Threshold and Transistor Sizing for Power Reduction

Low-Power Decimation Filter for 2.5 GHz Operation in Standard-Cell Implementation

Transcription:

Design of an Area-Efficient Interpolated FIR Filter Based on LUT Partitioning This paper describes the design of an area-efficient interpolation FIR filter with partitioned lookup table (LUT) structure. Since the LUT block occupies a large portion of area in the FIR filter, the proposed filter structure targets reduction of LUT size by partitioning, by exploiting coefficient symmetry, and by sharing the partitioned LUTs through the multiplexing of input data streams. Experimental results for several benchmark examples show that the proposed filter reduces the area by over 40% compared to the popular single-architecture dual-channel filter, while the power consumption is comparable to or less than that of conventional filter structures. The proposed FIR filter was designed for the -channel W-CDMA mobile station modulator. Index Terms: Interpolation FIR filter, Lookup table partitioning, Pulse-shaping. I.INTRODUCTION With the on-going research to introduce multimedia capabilities into digital mobile communication systems, communication standards specify the use of enabling technologies such as the -channel W-CMDA mobile station modulator [1]. Pulse-shaping 1:4 interpolated FIR filters are employed in each band-limited Quadrature Phase Shift Keying (QPSK) modulator to provide in-band spectral shaping while minimizing intersymbol interference (ISI) [2],[]. Each channel of QPSK modulator requires filtering operation for in-phase (I data) and quadrature-phase (Q data) signal components of the input data stream [4]. Hence, a total of six band-limited FIR filters are needed in the -channel W-CDMA modulator. A FIR filter using the transversal computation structure [5] adopts a polyphase structure to effectively pipeline the input data streams across a register chain prior to performing the main filtering operation. Despite the simplicity of the structure, it requires a prohibitively large number of registers, and incurs area overhead due to the added complexity involved with the pipeline structure [6]. An alternative FIR filter structure suitable for highspeed filtering employs the LUT for the core filtering operation. All possible filter outputs are pre-calculated and tabulated in memory for any input transition patterns. The input data stream constitutes the address to the LUT, and therefore evaluations of the filter output samples are carried out by simply reading-off constant values; slower dedicated arithmetic operations are substituted by a faster memory reference. LUT-based FIR filter designs have widely been implemented with ROM-based LUTs or hardwired LUTs [7]. A single chip implementation using ROM-based LUTs requires large area, power consumption, and added fabrication complexity. The ROM-based LUT occupies about 99% of the area of the entire filter design due to a large number of transition patterns; hence any area optimizations in the filter design center on a more compact LUT design. To overcome the drawbacks of the ROM-based LUTs, the table is hardwired as in the popularly used single-architecture dual-channel filter [7]. However, the hardwired LUT still occupies about 60% of the total area, necessitating further area reductions. Sun Young Hwang Jong Kwan Choi Sik Kim : Sogang University

In this paper, we propose the design and implementation of an area-efficient, single-channel LUT-based pulseshaping 1:4 interpolation FIR filter. Since the LUT block occupies a large portion of area in the FIR filter, the proposed filter structure targets reduction of LUT size by partitioning, by exploiting coefficient symmetry, and by sharing the partitioned LUTs through the multiplexing of input data streams. The proposed FIR filter has been designed for the -channel W-CDMA mobile station modulator. The rest of the paper is organized as follows: Section II provides an overview into the pulse-shaping 1:4 interpolated FIR filter structure employing LUTs. Section III describes the partitioned LUT structure, symmetric properties of the low-pass filter coefficients, and techniques for sharing the partitioned LUTs by multiplexing input data streams. Section IV highlights the implementation of the proposed single-channel dual-filter architecture adopting the features addressed in Section III, followed by experimental results in Section IV. The final section presents concluding remarks and future research topics. II. LUT-BASED FIR FILTER As mentioned in the previous section, FIR filters are employed in the QPSK modulator for spectral shaping while minimizing ISI. For each channel of the QPSK modulator, two 1:4 interpolation FIR filters are used for performing pulse-shaping operation. Hence, a total of six band-limited FIR filters are needed in the -channel W- CDMA modulator. Interpolation FIR filter designs can be implemented using the transversal structure and LUTbased structure. Equation 1 shows the operation of the 48-tap 1:4 interpolation FIR filter. The input signals to each filter x {x(n), x(n 1),..., x(n 11)} are multiplied with the filter coefficients h m {h 0, h 1,..., h 47 }, to produce four output signals, y(4n ), y(4n 2), y(4n 1), and y(4n). The filter coefficients are stored in ROM as 2 s complement representation. Multipliers are not necessary due to the simple binary nature of the input data hence, the result of the multiplication is either the coefficient itself or its inverse[8]. Figure 1 shows the transversal FIR filter implementation of Equation 1. It adopts a polyphase structure to effectively pipeline the input data streams across a register chain prior to performing the main filtering operation. Compared to the direct method [5], the transversal structure is simple and requires fewer multipliers and adders, thereby reducing the area. Despite

the reductions in the number of functional units and the simplicity of the structure, a prohibitively large number of registers are required, incurred by the pipeline structure. y(4n ) x(n) h 0 x(n 1) h 4 x(n 2) h 8... x(n 11) h 44 y(4n 2) x(n) h 1 x(n 1) h 5 x(n 2) h 9... x(n 11) h 45 y(4n 1) x(n) h 2 +x(n 1) h 6 x(n 2) h 10... x(n 11) h 46 y(4n) x(n) h x(n 1) h 7 x(n 2) h 11... x(n 11) h 47 (1) filter processing time can be dramatically reduced. Thus, LUT-based filters are suited for high-speed filtering applications, since uninterrupted filtering operation can be performed with a streamlined input data. However, LUTbased filter designs are plagued by large LUT sizes, hence LUT area minimization is required for efficient FIR filter realization. To obtain the designed outputs, each of the 11-bit outputs is generated at four times the chip clock rate (four phases). III. PROPOSED FIR FILTER DESIGN Figure 2 shows the LUT-based FIR filter structure. The operation of a 48-tap 1:4 interpolation FIR filter using LUT-based structure can be performed without customary arithmetic operations. Instead of filter coefficients being stored in ROM as in the classical FIR filter designs, each of the four filter outputs in Equation 1 is pre-calculated and tabulated in ROM for each input transition pattern. Therefore, with a 12-bit input x {x(n), x(n 1),..., x(n 11)}, directly implementing Equation 1 in LUT form requires a (2 12 11 4)-bit ROM table for storing all possible 11-bit output results, y(4n ),y(4n 2), y(4n 1), and y(4n). Each of the input data bits holds values of 1, 0, or 1. The input data values serve as direct 12-bit address values to the LUT, hence the four pre-generated 11-bit output values of the filtering operations are available upon request by a simple memory read operation. Since the filter outputs can be generated using only the input data values, the overall Despite the fast filtering operation made possible by use of LUTs, post-synthesis sessions for area measurements indicate that the LUT block in these FIR filters still occupy 60 to 99% of the total filter area, thus reductions in LUT area are required. Since the LUT block occupies a large portion of area in the FIR filter, the proposed filter structure targets reduction of LUT size by partitioning, by exploiting coefficient symmetry, and by sharing the partitioned LUTs through the multiplexing of input data streams. With the output stream size fixed, the size of the original LUT can be adjusted by varying the input stream size. Every reduced input bit halves the LUT address, which implies a reduction in LUT size by half. By the same token, by splitting a larger input bit stream into smaller bit-clusters, a single LUT can be partitioned into

Input data cluste size* 6 bits bits 2 bits 1 bits * Cluster sizes from a 12-bit stream input. Partitioned LUT unit area (a) 2 4 2 1 # LUTs used (b) 2 4 6 12 Total Area (a) (b) 64 16 12 12 multiple smaller LUTs. Table 1 shows various partitioning solutions with respect to a 12-bit input stream. The multiway partitioning of the LUT with respect to various data cluster sizes results in reduced total LUT area. Partitioning the LUT by 1-bit clusters yields six partitioned LUTs, resulting in maximal total LUT area reductions. However due to the increased complexity of the added glue logic resulting from LUT partitions, the overall area may increase. Moreover, logic synthesis of LUTs with very small bit stream clusters fails to produce anticipated area reductions, since the input products and SOP terms are not likely to be shared by the outputs during synthesis [9]. Figure shows a single-filter block design with three different LUT partitioning solutions. Each of the n-bit data clusters from the 12-bit shift-register is assigned to appropriate LUTs for filtering, and the partial results are added at the output. Due to the inherent symmetric nature of the low-pass filter coefficients, the upper half partitioned LUT blocks form mirror images of the lower half. Figure 4 shows a 24-tap 1:4 interpolation FIR filter operation over the input data D n {D 0, D 1, D 2, D, D 4, D 5,}. The 24-element low-pass filter coefficients, h m { h 0, h 1,..., h 2 }, are symmetrical across the center []. Thus D n can be split into two -data sets, with each set performing separate filtering operation. Note that prior to performing the filtering operation, data with phase index not equaling '00' are padded with zeros. The proposed 1:4 interpolation FIR filter utilizes a 48-tap filter coefficient requiring the input of twelve data elements. The twelve data elements are split into two 6-data sets to perform separate filtering operation and added together at the output[2],[5]. Generally, the number of multiplication and coefficient size can be reduced in half by first adding D n and its reflected counterpart, then by performing multiplication with a single common coefficient. However, in the case of FIR filters with 1-bit input, adopting the above technique is not suitable due to increased hardware complexity. As mentioned in Sections 1(III) and 2(III), due to the symmetric nature of the filter coefficients, the proposed single-channel filter blocks in Figure employs two sets of partitioned LUT blocks, when one set of LUT is the mirror image of the other. Since duplicated sets of LUTs signify wasted area, further area optimizations are feasible. Instead of performing single filtering operation with separate LUT pairs, the FIR filter design has been extended to a more efficient dual-filter design by judiciously using only one set of the mirror imaged LUT blocks, thereby reducing the LUT area by a fourth. Figure 5 shows the proposed single-channel dual-filter structure. Two 12-bit input data streams (Q data and I data) are divided and clustered into smaller n-bit packets (i.e. 6-bit Q r, Q f, and I r, I f data clusters) and multiplexed for optimal sharing of partitioned LUTs. After LUT-based filtering operation, which takes the form of memory references as mentioned in Section 2(II), the partial results pertaining to Q data and I data are piped to the adders for the final result. IV. IMPLEMENTATION OF THE PROPOSED FIR FILTER Figure 6 shows the proposed single-channel dual-filter block implemented by adopting the features addressed in Section III. The proposed FIR filter structure consists of three blocks _ input stage, partitioned LUT blocks, and output stage.

(a) Two LUTs (b) Four LUTs (c) Six LUTs.

As mentioned earlier, in each input channel of the QPSK modulator, quadrature-phase and in-phase signal components form two independent input streams. The Q and I data streams are fed into the proposed single-channel dual-filter block, and FIR-filtered Q and I data streams are output. The input stage of the proposed FIR filter block consists of a set of pipeline registers, for input of Q and I data streams. Q and I data sets are shifted into the registers as shown in Figure 6. When the register chain is full, the 12 input bits are packed into smaller data clusters and are dispatched to appropriate LUTs via selection multiplexers for filtering. As reported in Table 1, the twelve input bits can be split to share 1~, and 6 partitioned LUTs. Each LUT area decreases exponentially with the number of partitioned LUTs; the greater the number of LUT partitions the smaller the total LUT area. However, the overall area may increase due to adders, multiplexers, intermediate latches, and other glue logic. Thus exact area estimations require not just the area of the partitioned LUTs but also that of added glue logic with each partitioned solution. The proposed single-channel dualfilter block has been implemented using three different partitioning solutions: Tri-partitioning, bi-partitioning, and single LUT. Empirical results dictate that optimal area reduction for the LUT and surrounding logic is obtained by bi-partitioning the LUT with -bit cluster sizes, as shown in the shaded region in Table 1. Figure 6 (b) shows the proposed single-channel dual-filter block obtained by bi-partitioning the LUT. Note that implementing a dualfilter block by directly merging two single-filter structures of Figure requires four times the number of LUTs. However, by exploiting the symmetric properties of the filter coefficients and by sharing the LUTs through multiplexing clustered data, a more efficient dual-filter scheme can be devised. The proposed dual-filter filter output stage consists of pipelined latches, a tree of adders (10-bit CSA and 11-bit CLA), and a pair of filter output selector latches. After fetching the constant values from the previous LUT stage, the partial results pertaining to Q data and I data are piped to the tree of adders in alternating order. The 10-bit CSA and 11-bit CLA accumulate the partial results to form the

final filtered value, and a set of filter output selector latches direct the filtered Q and I data to corresponding filter output. V. EXPERIMENTAL RESULTS The proposed 1:4 interpolation FIR filter has been designed for different LUT partitioning solutions. The design has been simulated with Verilog-XL, and synthesized with Design Compiler. The power measurements were performed on the PowerMill circuit simulator at the logic level using the Samsung's STD70-0.6µm process standard cell library. Post-synthesis sessions have been carried out for the proposed FIR filter implementation, obtaining area and power measures for -channel 6-filter configurations. These evaluations results have been compared against three other conventional FIR filter implementations under equivalent configurations: 4-bank filter block employing ROM-based LUTs, transversal filter block, and the singlearchitecture dual-channel filter block. Table 2 reports the area profiles of the proposed FIR filter when compared against conventional pulse-shaping 1:4 interpolated FIR filter implementations. The relative costs for all the implementations are compared with the single-architecture dual-channel filter design, which has been popularly employed in -channel W-CMDA mobile station modulators. The proposed LUT-partitioned FIR filter design shows the area reduction of 40%. Note that

(a) Single LUT (b) Bi-partitioned LUT (c) Tri-partitioned LUT III

Area Measures 4-bank filter employing ROM-based LUTs Transversal filter Single-architecture dual-channel filter Proposed Filter Type 1: Tri-partitioned LUT Type 2: Bi-partitioned LUT Type : Single LUT Gate count 15,21.0 4,656.0 2,81.0 1,01.5 1,118.5 1,61.0 # blocks used 6 6 2 * Experimental results are compared against the single-architecture dual-channel filter. Total gate count 91,278.0 27,96.0 5,662.0,904.0,55.5 4,08.0 Relative cost 1,612.1 49.4 100.0 * 69.0 59. 72.1 Power Measures 4-bank filter employing ROM-based LUTs Transversal filter Single-architecture dual-channel filter Proposed Filter Type 1: Tri-partitioned LUT Type 2: Bi-partitioned LUT Type : Single LUT Clock frequency (MHz) 19.68 19.68 19.88 9.76 9.76 9.76 Filter power consumption (mw) 46.2 49.1 1.0 42.5 26.8 1.6 # blocks used 6 6 2 Total power (mw) 277.2 294.6 62.0 127.5 80.4 94.8 implementing a -channel 6-filter modulator requires 2 blocks of the single-architecture dual-channel filter. The proposed single-channel dual-filter block design is readily scalable to the N-channel 2N-filter ensemble, easily meeting the -channel 6-filter specification of the W-CDMA modulator. Among the proposed filter implementations shown in Table 1, type 2 (bi-partitioning the LUT) shows optimal area savings. Table reports the total power consumption for the proposed FIR filter compared against conventional filters. To measure the average power consumption synthesized cells are analyzed and characterized with EPIC's PowerMill for 24000ns. Overall decrease in power consumption is observed for all types of the proposed filter implementations despite little power optimization efforts. Note that even at twice the clock speed, the proposed filters consume the power comparable to or less than conventional designs. Among the proposed filter implementations shown in Table, type 2 (bi-partitioning the LUT) shows optimal power and area savings. The low-energy profile can be attributed to shorter critical paths due to smaller LUTs and retiming effects induced by the latches lying between the LUTs and the filter output. VI. CONCLUSION & FUTURE RESEARCHES In this paper, we described the design of an areaefficient 1:4 interpolation FIR filter by partitioning LUTs. Six filtering blocks (-channels of in-phase and quadrature-phase input signals) reside within the core blocks of the W-CDMA mobile station modulator. To obtain an area-efficient FIR filter implementation, conventional designs have centered their efforts on reducing the LUT size, since the LUT block occupies 60% to 99% of the total area of each FIR filter. Since the LUT block occupies a large portion of area in the FIR filter, the proposed filter structure targets reduction of LUT size by partitioning, by exploiting coefficient symmetry, and by

sharing the partitioned LUTs through the multiplexing of input data streams. More than 40% reductions in gate area have been obtained for the proposed pulse-shaping filter when compared to conventional FIR filter blocks. Despite no low-power design efforts, the proposed FIR filter consumes power comparable to the popular singlearchitecture dual-channel filter. For portable communication system applications, future research works warrant a power-conscious design effort employing signal transition analysis. Acknowledgement: This research was supported by the Sogang University Research Grants in 2000. [REFERENCES] [1] K. Yeon et al., 'Design of Chip Set for CDMA Mobile System,' ETRI Journal, Vol. 19, No., Oct. 1997, pp. 228~241. [2] A. Oppenheim and D. Manolakis, Discrete-Time Signal Processing, Prentice-Hall: Englewood Cliffs, New Jersey, 1989. [] J. Holmes, Coherent Spread Spectrum Systems, Wiley: New York, 1982. [4] S. Glisic and P. Leppanen, Code Division Multiple Access Communications, Kluwer Academic Publishers, 1995. [5] J. Proakis and D. Manolakis, Digital Signal Processing: Principles, Algorithms, and Applications, Prentice-Hall: Upper Saddle River, New Jersey, 1996. [6] R. Peterson, R. Ziemer, and D. Borth, Introduction to Spread-Spectrum Communications, Prentice Hall: Englewood Cliffs, New Jersey, 1995. [7] I. Kang, K. Yeon, H. Jo, J. Chong, and K. Kim, 'Multiple 1: N Interpolation FIR Filter Design Based on a Single Architecture,' in Proc. IEEE Int. Symposium on Circuits and Systems, Vol. 2, May 1998, pp. 16~19. [8] G. Do and K. Feher, 'Efficient Filter Design for IS-95 CDMA Systems,' IEEE Trans. on Consumer Electronics, Vol. 42, No. 4, Nov. 1996, pp. 1011~1020. [9] N. Weste and K. Eshraghian, Principles of CMOS VLSI Design, 2nd Edition, Addison-Wesley, 199.

Sun-Young Hwang received the B.S. degree in electronic engineering from Seoul National University, Seoul, Korea, in 1976, the M.S. degree from Korea Advanced Institute of Science in 1978, and the Ph.D. degree in electrical engineering from Stanford University, California, U.S.A., in 1986. Since 1986, he has been with the Center for Integrated Systems at Stanford University, working on design of a high-level synthesis and simulation system. In 1986 and 1987, he held a consulting position at Palo Alto Research Center of Fairchild Semiconductor Corporation. In 1989, he joined the Department of Electronic Engineering at Sogang University, where he is now professor. His current research interests include hardware/software co-design, and DSP/VLSI systems design. E-mail: hwang@ccs.sogang.ac.kr Tel : +82-2-705-8469 Fax :+82-2-272-220 Jong-Kwan Choi received the B.S. degree in electronic engineering from Sogang University, Seoul, Korea, in 1992. From 1992 to 1995 he was senior engineer at the ASIC center (Semiconductor division) of Daewoo Telecom., LTD. He has joined IAE as a research engineer in 1996, and currently he is working towards the M.S. degree in electrical engineering at Sogang University. His current research interests include communication/network (ATM & Ethernet) system IC design and mobile communication/modem design. E-mail: cjksgu@yahoo.com Tel : +82-2-705-8469 Fax :+82-2-272-220 Sik Kim received the B.S. and M.S. degrees in electronic engineering from Sogang University, Seoul, Korea, in 1994 and 1996, respectively. He is currently working towards the Ph. D degree in electronic engineering at Sogang University. His current research interests include Digital VLSI system design, high speed computer architecture, and CAD system development. E-mail : cosmos@eecad.sogang.ac.kr Tel : +82-2-705-8469 Fax : +82-2-272-220