Operating Bio-Implantable Devices in Ultra-Low Power Error Correction Circuits: using optimized ACS Viterbi decoder

Similar documents
Design Project: Designing a Viterbi Decoder (PART I)

Hardware Implementation of Viterbi Decoder for Wireless Applications

data and is used in digital networks and storage devices. CRC s are easy to implement in binary

This paper is a preprint of a paper accepted by Electronics Letters and is subject to Institution of Engineering and Technology Copyright.

Design of Polar List Decoder using 2-Bit SC Decoding Algorithm V Priya 1 M Parimaladevi 2

Implementation of CRC and Viterbi algorithm on FPGA

Adaptive decoding of convolutional codes

An Efficient Viterbi Decoder Architecture

Novel Correction and Detection for Memory Applications 1 B.Pujita, 2 SK.Sahir

An MFA Binary Counter for Low Power Application

BER Performance Comparison of HOVA and SOVA in AWGN Channel

FPGA Implementation of Convolutional Encoder And Hard Decision Viterbi Decoder

Design And Implementation Of Coding Techniques For Communication Systems Using Viterbi Algorithm * V S Lakshmi Priya 1 Duggirala Ramakrishna Rao 2

Efficient Architecture for Flexible Prescaler Using Multimodulo Prescaler

FPGA Implementation of Viterbi Decoder

CHAPTER 6 ASYNCHRONOUS QUASI DELAY INSENSITIVE TEMPLATES (QDI) BASED VITERBI DECODER

Area-efficient high-throughput parallel scramblers using generalized algorithms

Implementation of Low Power and Area Efficient Carry Select Adder

Using Embedded Dynamic Random Access Memory to Reduce Energy Consumption of Magnetic Recording Read Channel

Implementation and performance analysis of convolution error correcting codes with code rate=1/2.

SDR Implementation of Convolutional Encoder and Viterbi Decoder

Power Optimization by Using Multi-Bit Flip-Flops

Clock Gating Aware Low Power ALU Design and Implementation on FPGA

Fault Detection And Correction Using MLD For Memory Applications

LUT Optimization for Memory Based Computation using Modified OMS Technique

HIGH PERFORMANCE AND LOW POWER ASYNCHRONOUS DATA SAMPLING WITH POWER GATED DOUBLE EDGE TRIGGERED FLIP-FLOP

Design of Low Power Efficient Viterbi Decoder

The main design objective in adder design are area, speed and power. Carry Select Adder (CSLA) is one of the fastest

A High- Speed LFSR Design by the Application of Sample Period Reduction Technique for BCH Encoder

Implementation and Analysis of Area Efficient Architectures for CSLA by using CLA

AN EFFICIENT LOW POWER DESIGN FOR ASYNCHRONOUS DATA SAMPLING IN DOUBLE EDGE TRIGGERED FLIP-FLOPS

A NOVEL DESIGN OF COUNTER USING TSPC D FLIP-FLOP FOR HIGH PERFORMANCE AND LOW POWER VLSI DESIGN APPLICATIONS USING 45NM CMOS TECHNOLOGY

THE USE OF forward error correction (FEC) in optical networks

Performance of a Low-Complexity Turbo Decoder and its Implementation on a Low-Cost, 16-Bit Fixed-Point DSP

Figure.1 Clock signal II. SYSTEM ANALYSIS

The Design of Efficient Viterbi Decoder and Realization by FPGA

128 BIT CARRY SELECT ADDER USING BINARY TO EXCESS-ONE CONVERTER FOR DELAY REDUCTION AND AREA EFFICIENCY

FPGA Based Implementation of Convolutional Encoder- Viterbi Decoder Using Multiple Booting Technique

Research Article Low Power 256-bit Modified Carry Select Adder

Keywords Xilinx ISE, LUT, FIR System, SDR, Spectrum- Sensing, FPGA, Memory- optimization, A-OMS LUT.

Low Power Area Efficient Parallel Counter Architecture

Design and Analysis of Modified Fast Compressors for MAC Unit

An Efficient High Speed Wallace Tree Multiplier

Design of an Efficient Low Power Multi Modulus Prescaler

Abstract 1. INTRODUCTION. Cheekati Sirisha, IJECS Volume 05 Issue 10 Oct., 2016 Page No Page 18532

COPY RIGHT. To Secure Your Paper As Per UGC Guidelines We Are Providing A Electronic Bar Code

Random Access Scan. Veeraraghavan Ramamurthy Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL

Research Article Design and Implementation of High Speed and Low Power Modified Square Root Carry Select Adder (MSQRTCSLA)

LOW POWER VLSI ARCHITECTURE OF A VITERBI DECODER USING ASYNCHRONOUS PRECHARGE HALF BUFFER DUAL RAILTECHNIQUES

An Efficient Reduction of Area in Multistandard Transform Core

Memory efficient Distributed architecture LUT Design using Unified Architecture

Design of a High Frequency Dual Modulus Prescaler using Efficient TSPC Flip Flop using 180nm Technology

A Low Power Delay Buffer Using Gated Driver Tree

CMOS Design Analysis of 4 Bit Shifters 1 Baljot Kaur, M.E Scholar, Department of Electronics & Communication Engineering, National

EEC 116 Fall 2011 Lab #5: Pipelined 32b Adder

Gated Driver Tree Based Power Optimized Multi-Bit Flip-Flops

An optimized implementation of 128 bit carry select adder using binary to excess-one converter for delay reduction and area efficiency

International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013

REDUCING DYNAMIC POWER BY PULSED LATCH AND MULTIPLE PULSE GENERATOR IN CLOCKTREE

A Symmetric Differential Clock Generator for Bit-Serial Hardware

High Speed 8-bit Counters using State Excitation Logic and their Application in Frequency Divider

DESIGN AND SIMULATION OF A CIRCUIT TO PREDICT AND COMPENSATE PERFORMANCE VARIABILITY IN SUBMICRON CIRCUIT

LFSR Counter Implementation in CMOS VLSI

Sharif University of Technology. SoC: Introduction

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS)

A Modified Static Contention Free Single Phase Clocked Flip-flop Design for Low Power Applications

A Discrete Time Markov Chain Model for High Throughput Bidirectional Fano Decoders

Design of Carry Select Adder using Binary to Excess-3 Converter in VHDL

Design And Analysis Of Implicit Pulsed Double Edge Triggered Clocked Latch For Low Power Applications

Dual frame motion compensation for a rate switching network

Design of a Low Power Four-Bit Binary Counter Using Enhancement Type Mosfet

An Efficient Low Bit-Rate Video-Coding Algorithm Focusing on Moving Regions

VLSI Design: 3) Explain the various MOSFET Capacitances & their significance. 4) Draw a CMOS Inverter. Explain its transfer characteristics

Dual Slope ADC Design from Power, Speed and Area Perspectives

Low-Power and Area-Efficient Shift Register Using Pulsed Latches

Digitally Assisted Analog Circuits. Boris Murmann Stanford University Department of Electrical Engineering

Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video

FAULT SECURE ENCODER AND DECODER WITH CLOCK GATING

IC Design of a New Decision Device for Analog Viterbi Decoder

Design And Implimentation Of Modified Sqrt Carry Select Adder On FPGA

ISSCC 2006 / SESSION 14 / BASEBAND AND CHANNEL PROCESSING / 14.6

TRELLIS decoding is pervasive in digital communication. Parallel High-Throughput Limited Search Trellis Decoder VLSI Design

A Low Power Implementation of H.264 Adaptive Deblocking Filter Algorithm

Peak Dynamic Power Estimation of FPGA-mapped Digital Designs

IC Layout Design of Decoders Using DSCH and Microwind Shaik Fazia Kausar MTech, Dr.K.V.Subba Reddy Institute of Technology.

Design of Memory Based Implementation Using LUT Multiplier

DESIGN AND ANALYSIS OF COMBINATIONAL CODING CIRCUITS USING ADIABATIC LOGIC

Temporal Error Concealment Algorithm Using Adaptive Multi- Side Boundary Matching Principle

Color Image Compression Using Colorization Based On Coding Technique

FPGA Implementaion of Soft Decision Viterbi Decoder

An Efficient 64-Bit Carry Select Adder With Less Delay And Reduced Area Application

Area Efficient Pulsed Clock Generator Using Pulsed Latch Shift Register

ALONG with the progressive device scaling, semiconductor

Design of Fault Coverage Test Pattern Generator Using LFSR

Implementation of a turbo codes test bed in the Simulink environment

Implementation of High Speed Adder using DLATCH

Interframe Bus Encoding Technique for Low Power Video Compression

ISSN:

Optimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015

AN OPTIMIZED IMPLEMENTATION OF MULTI- BIT FLIP-FLOP USING VERILOG

Transcription:

Operating Bio-Implantable Devices in Ultra-Low Power Error Correction Circuits: using optimized ACS Viterbi decoder Roshini R, Udhaya Kumar C, Muthumani D Abstract Although many different low-power Error Correction Circuits implementations have been articled, to date there has not been any comprehensive study to evaluate the comparative efficiencies of alternative analog and digital implementations. This has led to a significant analysis of the varied analog and digital iterative message passing algorithms. These algorithms operate on decoders. The inefficiency and high cost constraints of decoders have led to significant loss in data transfer rate. Algorithms like Min-Sum algorithm for Analog Belief Prorogation decoders and Viterbi algorithm for Turbo decoders are coded and simulate in this work. The paper aims at reducing power consumption by using a modified Viterbi decoder in a higher end CMOS technology. The power analysis is deduced for comparative studies. This power constraint level of 10mW is further reduced, without altering the requirements for a smooth operation of the device. This is achieved by designing a novel decoder that increases the data rate with lower power consumption. The reduction of power consumption is achieved at the cost of reduced error performance. The most efficient decoder is implemented in bio-implantable devices like cortical implants for ultra-low- power constraints. Index Terms Analog belief propagation (ABP), cortical implants, error correction circuits (ECC), iterative message passing, Turbo decoders, Viterbi algorithm. I. INTRODUCTION In information theory and coding with applications in computer science and telecommunication, error detection and correction and error control are techniques that enable reliable delivery of digital data over unreliable communication channels. Channel noise is prevalent in communication channels, this leads to introduction of errors during transmission from source to receiver. Error detection methods permit detecting these errors, whereas error correction helps in the reconstruction of the original data. In Wireless communication, data transmission is often corrupted due to noise and channel distortion. To rectify these errors, redundancy is used. Techniques are evaluated for implementing error correction codes in wireless applications with heavy power constraints as in bio-implantable devices and energy harvesting motes. Decoding is accomplished iteratively by exchanging messages between sub-decoders. The messages are interpreted as local decoding estimates for each of the sub codes, and by combining all local information; a message passing decoder obtains dramatically improved performance. Based on the identifying the most optimal decoder Section II briefs with the concept of the decoders in wireless communication and the motivation. Section III describes the Min-Sum algorithm in an ABP, the single stage Viterbi decoder and a modified ACS unit in a Viterbi decoder using the 0.35µm CMOS technology. Section IV reveals the proposed modified butterfly arrangement of ACS unit Viterbi decoder using 90nm CMOS technology. In Section V a comparison of power is tabulated and the most optimal decoder is chosen. II. DECODERS IN WIRELESS COMMUNICATION Demand for turbo codes in wireless communication systems has been increasing since their appearance in the early 1990s; due to their outstanding performance in terms of bit error rate. Various turbo decoders have been developed to improve their performance at algorithm and architecture levels. A dual mode decoder for convolutional and turbo codes has also been introduced for multi-standard wireless communication systems. In order to correspond to different standards of wireless communication systems. Therefore a growing need for optimized decoders that can be implemented and utilized for bio-implantable devices. 166

III. DECODING TECHNIQUES EXECUTED A. The Min-sum (MS) Algorithm for Analog Belief Propagation decoders In-body communication links are attracting a growing interest from researchers across many branches of engineering, biology and medicine. Some applications place heavy demands on communication performance, especially in the area of cortical stimulation for neural rehabilitation and brain-machine interfaces. Cortical interfaces demand high-speed data transfer to receivers located inside the body; however the implanted receiver must maintain very low power consumption to avoid overheating the surrounding tissues. This introduces a challenging problem for optimizing receiver-side performance in the communication system. Hence the need for soft decision error detection decoders arises. The MS algorithm is an approximate version of the BP algorithm that obtains significantly reduced complexity with a mild performance loss. The following diagram shows the decoder design for which the algorithm is coded. Fig 1 Min-Sum decoder Architecture Figure of the Min-sum decoder shows the memory management of the decoder. A (3,5) code is used for clarity. As indicated in the figure, MEM1 is the message memory bank for the shadowed place in the parity check matrix. It saves only the check to variable messages, in a compressed form. MEM ij (i=2,3. j=1~5) saves the messages of sub block (i,j) of the parity check matrix. For each memory bank, there is an address generator to control the memory access. Because of the special structure of QC-LDPC codes, the address generator for each memory bank can be built with a simple counter. The Variable Node Processor (NP) and Check Node Processor (CNP) will get the input from appropriate MEMs and save back the computed messages at the same address. Besides this, five memory banks are instantiated to save the channel information, each for every P variable nodes. Fig 2: CNP architecture 167

The above figure gives the CNP architecture of the ABP decoder.. It finds the smallest two inputs and the index of the minimum one. Function of the sub-module MIN is to record the minimum, 2nd-Min and the index of the minimum. This check node process can be time consuming for a big row weight matrix plus the time needed in the variable node process, the critical path is long. To increase the clock speed, the data paths are cut by two level pipelining. Hence, the pipelining can increase the clock speed without inducing memory access conflict [10]. Fig 3: VNP- Scale Modulo architecture The VNP scale modulo performs in scaling the variable to check node message or to minus an offset. The algorithm is developed to suit for hardware implementation. The pseudo code is given as following. if input >= 8 output = 3 b111; else if input >= 4 output = input-1; else output = input (unchanged); The generic flow of a MS algorithm is described based on the flow chart fig 4. First, all check node inputs are initialized to 0, and then, a check node update step (i.e., row processing) is done to produce α messages. Second, the variable node receives the new α messages, this is followed by the variable node update step (i.e., column processing) to produce messages. This process repeats for another iteration by passing the previous iteration s β messages to the check nodes. The algorithm finally terminates when it reaches a maximum number of decoding iterations (I max ) or a valid code word is detected. Fig 4: Flow diagram of MS algorithm The decoding steps (Algorithm) are given below: a) Initialization: Read the values from channel and store them in t memories. 168

b) Iteration: Compute the message from Variable nodes to Check nodes and save them in message memory MEM ij except the upper P rows. For the upper P rows, do the check node processing and save the returned message in MEM1 with compressed form. c) Check node process: Compute the check to variable messages of the lower 2P rows and save back in MEM ij. Do iteration until all the check equations are satisfied or the maximum iteration number reached. d) Output the decoded code word. Fig 5: Simulation of the CNP processor in an ABP decoder using MS algorithm. The simulation depicts the working of a check node processor. There are six signal inputs given, for which a check is done. The inputs are compared and given an index. Through a series of iterations the final min_2 value can be obtained. These reveal that the algorithm optimizes for the least sum value in a message passing algorithm. In this paper, the power consumption of the CNP processor for an ABP decoder is calculated and analyzed with the other decoders. Table 1: Power consumption summary for an ABP decoder using Min-Sum algorithm. Power consumed by the MS algorithm is about 63mW. The ABP decoder had a high circuit complexity and an average performance. The other decoders and their simulations are explained. B. Single Stage Viterbi decoder There have been a few Convolutional decoding methods such as sequential and Viterbi decoding, of which the most commonly employed technique is the Viterbi Algorithm. The VA is an advanced algorithm in comparison to the MS algorithm. Fig 6: Block diagram Single Stage Viterbi decoder A Viterbi decoder consists of the three major parts: Branch metric unit, Path metric unit and trace back unit. Branch Metric Calculation 169

The primary unit is called Branch metric unit. The Hamming distance (or other metric) values is computed at each time instant for the paths between the states at the previous time instant and the states at the current time instant are called branch metrics. Hamming distance or Euclidean distance is used for branch metric computation. Path Metric Calculation This is also called the Add Compare Select (ACS) unit. An Error metric also called path metric (PM) contains the 2K-1 optimal paths. The obtained Branch Metric is added to previous PM and each the two distances are compared for all Add- compare select unit.the speed the performance of Viterbi Decoder is mainly determined by the number of ACS (2K-1) units and their computation time. Fig 7: Block diagram of the ACS unit The ACS unit for a single stage decoder is simulated and the power summary is obtained. Since it is only a single stage decoder, the gates used in this unit drive high power. But the power consumption is reduced to 45mW referring to Table 2. In this paper, the power consumption for a single stage Viterbi decoder is calculated for a supply voltage of 0.6V. Fig 8: Simulation for a single stage Viterbi decoder- ACS unit. The following is a table depicting power summary of the Single Stage Viterbi Decoder. Table 2: Power consumption summary The power summary clearly indicated the reduction in consumption of the Viterbi decoder as compared to the ABP 170

decoder. An approximation of 18% reduction can be obtained. C. Viterbi decoder with modified ACS unit using 0.35µm Technology The quality of a Viterbi decoder design is mainly measured by three criteria: coding gain, throughput, and power dissipation [4]. High coding gain results in low data transfer error probability while high throughput is necessary for high-speed applications. The design of Viterbi decoders with high coding gain and throughput is made challenging by the need for a low power circuit implementation. The ACS unit is arranged in a butterfly manner. Fig 9: The modified ACS unit in a Viterbi decoder Existing implementations portray a power consumption of 14.88mW if 0.6V is given as supply voltage The PM and PM are the Branch Metric and the Path Metric parameters. It has inputs S a and S b, for which the outputs are S 0 and S 1 The ACS unit has be re-arranged in a butterfly network for optimization. This reduces the number of repetitive blocks. This tends to automatically decrease the power consumption. Table 3: Power summary of Viterbi decoder with modified ACS unit. Design Technology V dd (supply voltage) Power (mw) Modified ACS unit 0.35µm 3.3V 109 Modified ACS unit 0.35µm 2.5V 62 Modified ACS unit 0.35µm 0.6V 14.88 The power reduction is clear that the modified ACS unit is better than the ABP and the Single Stage Viterbi decoder. The power consumption decrease percentage with respect to ABP decoder using MS algorithm is 76.28% and with respect to the single stage Viterbi decoder is 66.93%. IV. PROPOSED DECODER Optimized ACS unit in a Viterbi decoder using 90nm CMOS technology, we can obtain the layout and the power calculation for a single ACS unit. The ACS unit consists of two 2 bit subtractor circuits, an 8 bit subtractor, two 8 to 2 bit comparators and two 8 to 3 bit adders. Three stage Viterbi decoder with optimized ACS unit in 90nm technology. Here each block represents an ACS unit. From Fig 11, 171

Fig 10: Block diagram of a three stage ACS unit The paper proposes a power calculation for every individual stage of the ACS unit. The overall power calculation for the 3 stage ACS unit in this decoder using 90nm is shown with simulation and characteristics results. Fig 11: Layout of Three stage Viterbi decoder with optimized ACS unit in 90nm technology. The power consumption for an individual ACS unit is about 8.531µW. If three stages are used then the power is an approximation of 25.593µW. This is significantly very low in comparison to the other decoders analysed. The data rate is high but the error performance is a little low. The throughput obtained from the three stage Viterbi decoder wnith modified ACS unit is 200Mb/s using a supply voltage of 0.6V. 172

Power consumption (mw) ISSN: 2319-5967 Fig 12: Power summary of an individual optimized ACS unit -90nm technology V. COMPARISON OF RESULTS Comparing the results of the four decoders analyzed, the tabulation Table 4, shows the power variations for the different decoders. The optimal decoder is the three stage Viterbi decoder with optimized ACS unit using 90nm CMOS technology. Technology scaling has contributed to this power reduction. Table 4: Power variations of the four simulated decoders Decoder name CMOS Technology used Power consumption Supply voltage Min Sum / ABP decoder 90nm 63mW 0.6V Single Stage Viterbi decoder 90nm 45mW 0.6V Viterbi decoder with modified ACS unit Viterbi decoder with modified ACS unit 0.35µm 14.88mW 0.6V 90nm 8.531 µw 0.6V Decoder variants Fig 13: Decoder variants vs. Power consumption a graphical representation 173

VI. CONCLUSION Simulation results prove that an optimal decoder with slight modifications in the design leads to a considerable increase in throughput rate and a decrease in the power consumption. The proposed decoder shows optimal results. This serves as a basis for many ultra-low power applications that are power constraining. REFERENCES [1] Chris Winstead and Joachim Neves Rodrigues, Ultra-Low-Power Error Correction Circuits: Technology Scaling and Sub-VT Operation IEEE Trans. On Circuits and Systems II: Express Briefs, Vol. 59, No. 12, December 2012. [2] O. C. Akgun, J. N. Rodrigues, Y. Leblebici, and V. Owall (2012) High-level energy estimation in the sub-vt domain: Simulation and measurement of a cardiac event detector, IEEE Trans. Biomed. Circuits Syst., vol. 6, no. 1, pp. 15 27. [3] O. C. Akgun, J. Rodrigues, and J. Sparso, (2010) Minimum-energy sub threshold self-timed circuits: Design methodology and a case study, in Proc. 16th IEEE Int. Symp. Asynchron. Circuits Syst., pp. 41 51. [4] Xun liu and M.C. Papaefthymiou. Design of a High-Throughput Low-Power IS95 Viterbi Decoder, in Proc. Of design Automation Conf. (DAC), pp. 263-268, June 2002. [5] Chris Winstead and Yi Luo (2012) Error Correction Circuits for Bio-Implantable Electronics Dept. of Electrical and Computer Engineering Utah State University. [6] S.Baskar, M.Saravanan. (2012) Error Detection And Correction Enhanced Decoding Of Difference Set Codes For Memory Application International Journal of Advanced Research in Computer and Communication Engineering Vol. 1, Issue 10. [7] R. G. Gallager, (1963) Low-density parity-check codes, IRE Trans. Inf. Theory, vol. 8, no. 1, pp. 21 28. [8] Y. Sun, J. Cavallaro, and T. Ly, (2009) Scalable and low power LDPC decoder design using high level algorithmic synthesis, in Proc. IEEE Int. SOCC, pp. 267 270. [9] Tinoosh Mohsenin, Dean N. Truong, and Bevan M. Baas, (2010) A Low-Complexity Message-Passing Algorithm for Reduced Routing Congestion In LDPC Decoders IEEE Transactions On Circuits And Systems I: Regular Papers, Vol. 57, No. 5. [10] Jin Sha, Minglun Gao, Zhongjin Zhang,Li Li Zhongfeng Wang, (2006) A Memory Efficient FPGA Implementation of Quasi-Cyclic LDPC Decoder. Proceedings of the 5th WSEAS Int. Conf. on Instrumentation, Measurement, Circuits and Systems, Hangzhou, China, (pp218-223). 174