The implementation challenges of polar codes

Similar documents
Design of Polar List Decoder using 2-Bit SC Decoding Algorithm V Priya 1 M Parimaladevi 2

On The Feasibility of Polar Code as Channel Code Candidate for the 5G-IoT Scenarios 1

Implementation of a turbo codes test bed in the Simulink environment

Performance of a Low-Complexity Turbo Decoder and its Implementation on a Low-Cost, 16-Bit Fixed-Point DSP

This paper is a preprint of a paper accepted by Electronics Letters and is subject to Institution of Engineering and Technology Copyright.

VHDL IMPLEMENTATION OF TURBO ENCODER AND DECODER USING LOG-MAP BASED ITERATIVE DECODING

HYBRID CONCATENATED CONVOLUTIONAL CODES FOR DEEP SPACE MISSION

Optimum Frame Synchronization for Preamble-less Packet Transmission of Turbo Codes

Commsonic. Satellite FEC Decoder CMS0077. Contact information

On the design of turbo codes with convolutional interleavers

BER MEASUREMENT IN THE NOISY CHANNEL

PRACTICAL PERFORMANCE MEASUREMENTS OF LTE BROADCAST (EMBMS) FOR TV APPLICATIONS

Latest Trends in Worldwide Digital Terrestrial Broadcasting and Application to the Next Generation Broadcast Television Physical Layer

NUMEROUS elaborate attempts have been made in the

Hardware Implementation of Viterbi Decoder for Wireless Applications

Part 2.4 Turbo codes. p. 1. ELEC 7073 Digital Communications III, Dept. of E.E.E., HKU

A LOW COST TRANSPORT STREAM (TS) GENERATOR USED IN DIGITAL VIDEO BROADCASTING EQUIPMENT MEASUREMENTS

Adaptive decoding of convolutional codes

THIRD generation telephones require a lot of processing

Packet Scheduling Bandwidth Type-Based Mechanism for LTE

REDUCED-COMPLEXITY DECODING FOR CONCATENATED CODES BASED ON RECTANGULAR PARITY-CHECK CODES AND TURBO CODES

A Novel Turbo Codec Encoding and Decoding Mechanism

100Gb/s Single-lane SERDES Discussion. Phil Sun, Credo Semiconductor IEEE New Ethernet Applications Ad Hoc May 24, 2017

FPGA Implementation of Convolutional Encoder And Hard Decision Viterbi Decoder

SIC receiver in a mobile MIMO-OFDM system with optimization for HARQ operation

Analog Sliding Window Decoder Core for Mixed Signal Turbo Decoder

Investigation of the Effectiveness of Turbo Code in Wireless System over Rician Channel

An MFA Binary Counter for Low Power Application

(51) Int Cl.: H04L 1/00 ( )

An Implementation of a Forward Error Correction Technique using Convolution Encoding with Viterbi Decoding

Successive Cancellation Decoding of Single Parity-Check Product Codes

Review paper on study of various Interleavers and their significance

AN UNEQUAL ERROR PROTECTION SCHEME FOR MULTIPLE INPUT MULTIPLE OUTPUT SYSTEMS. M. Farooq Sabir, Robert W. Heath and Alan C. Bovik

POLAR codes are gathering a lot of attention lately. They

data and is used in digital networks and storage devices. CRC s are easy to implement in binary

RF Technology for 5G mmwave Radios

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath

Implementation of CRC and Viterbi algorithm on FPGA

Higher-Order Modulation and Turbo Coding Options for the CDM-600 Satellite Modem

Design and Implementation of Encoder and Decoder for SCCPM System Based on DSP Xuebao Wang1, a, Jun Gao1, b and Gaoqi Dou1, c

Robust Transmission of H.264/AVC Video using 64-QAM and unequal error protection

Area-efficient high-throughput parallel scramblers using generalized algorithms

White Paper Lower Costs in Broadcasting Applications With Integration Using FPGAs

Experiment 7: Bit Error Rate (BER) Measurement in the Noisy Channel

REGIONAL NETWORKS FOR BROADBAND CABLE TELEVISION OPERATIONS

FPGA Implementation OF Reed Solomon Encoder and Decoder

Physical Layer Signaling for the Next Generation Mobile TV Standard DVB-NGH

Transmission System for ISDB-S

Design Project: Designing a Viterbi Decoder (PART I)

Commsonic. (Tail-biting) Viterbi Decoder CMS0008. Contact information. Advanced Tail-Biting Architecture yields high coding gain and low delay.

Decoder Assisted Channel Estimation and Frame Synchronization

DVB-S2X for Next Generation C4ISR Applications

LTE-A Base Station Performance Tests According to TS Rel. 12 Application Note

Low Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Performance Study of Turbo Code with Interleaver Design

TERRESTRIAL broadcasting of digital television (DTV)

Optimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015

Keysight E4729A SystemVue Consulting Services

DESIGN OF A MEASUREMENT PLATFORM FOR COMMUNICATIONS SYSTEMS

Robust Transmission of H.264/AVC Video Using 64-QAM and Unequal Error Protection

Audio and Video II. Video signal +Color systems Motion estimation Video compression standards +H.261 +MPEG-1, MPEG-2, MPEG-4, MPEG- 7, and MPEG-21

Local Television Capacity Assessment

BER Performance Comparison of HOVA and SOVA in AWGN Channel

Half-Adders. Ch.5 Summary. Chapter 5. Thomas L. Floyd

Synchronization Overhead in SOC Compressed Test

INTERNATIONAL TELECOMMUNICATION UNION

Motion Video Compression

Commsonic. ISDB-S3 Modulator CMS0070. Contact information

FPGA Based Implementation of Convolutional Encoder- Viterbi Decoder Using Multiple Booting Technique

System Level Simulation of Scheduling Schemes for C-V2X Mode-3

Digital Video Telemetry System

IC Design of a New Decision Device for Analog Viterbi Decoder

International Journal of Scientific & Engineering Research, Volume 6, Issue 3, March-2015 ISSN DESIGN OF MB-OFDM SYSTEM USING HDL

Satellite Digital Broadcasting Systems

[Dharani*, 4.(8): August, 2015] ISSN: (I2OR), Publication Impact Factor: 3.785

PERFORMANCE AND MODELING OF LTE H-ARQ. Josep Colom Ikuno, Martin Wrulich, Markus Rupp

Fast Polar Decoders: Algorithm and Implementation

Intelsat-29e Interference Mitigation Testing Interference Scenarios and Mitigation Techniques Enabled by the Intelsat Epic NG Class Satellites

Code-aided Frame Synchronization

A Robust Turbo Codec Design for Satellite Communications

Design and Analysis of Modified Fast Compressors for MAC Unit

Error Performance Analysis of a Concatenated Coding Scheme with 64/256-QAM Trellis Coded Modulation for the North American Cable Modem Standard

White Paper Versatile Digital QAM Modulator

IN A SERIAL-LINK data transmission system, a data clock

FLEXIBLE SWITCHING AND EDITING OF MPEG-2 VIDEO BITSTREAMS

Feasibility Study of Stochastic Streaming with 4K UHD Video Traces

FPGA Implementation of Viterbi Decoder

Using Embedded Dynamic Random Access Memory to Reduce Energy Consumption of Magnetic Recording Read Channel

Fig 1. Flow Chart for the Encoder

Analogue Versus Digital [5 M]

LTE RF Measurements with the R&S CMW500 according to 3GPP TS Application Note. Products: R&S CMW500

DELTA MODULATION AND DPCM CODING OF COLOR SIGNALS

B Joon Tae Kim Jong Gyu Oh Yong Ju Won Jin Sub Seop Lee

International Journal of Engineering Research-Online A Peer Reviewed International Journal

EFFECT OF THE INTERLEAVER TYPES ON THE PERFORMANCE OF THE PARALLEL CONCATENATION CONVOLUTIONAL CODES

VA08V Multi State Viterbi Decoder. Small World Communications. VA08V Features. Introduction. Signal Descriptions

International Research Journal of Engineering and Technology (IRJET) e-issn: Volume: 03 Issue: 07 July p-issn:

Digital Transmission System Signaling Protocol EVLA Memorandum No. 33 Version 3

Clause 74 FEC and MLD Interactions. Magesh Valliappan Broadcom Mark Gustlin - Cisco

The Design of Efficient Viterbi Decoder and Realization by FPGA

Transcription:

The implementation challenges of polar codes Robert G. Maunder CTO, AccelerComm February 28 Abstract Although polar codes are a relatively immature channel coding technique with no previous standardised applications, they have been selected by the 3rd Generation Partnership Project (3GPP) to provide error correction in the New Radio (NR) standard for 5th Generation (5G) mobile communications. The hardware acceleration of polar encoding and decoding will be necessary in order in to meet the strict requirements in many applications of 5G. However, the processes of polar encoding and decoding are complicated and it is not trivial to translate them into hardware. This white paper provides a tutorial of the polar encoding and decoding processes, before discussing the challenges of their hardware implementation. I. INTRODUCTION In mobile communication, channel coding may be used to protect information against the effects of transmission errors, which may be caused by noise, interference or poor signal strength. More specifically, a channel encoder is used to encode the information in the transmitting device, which may be a basestation, a handset or another user device. This allows a corresponding channel decoder to be used in the receiving device, in order to mitigate the transmission errors and recover the transmitted information. In recent decades, several high-performance channel codes have been developed, which allow information to be reliably transmitted at rates that closely approach the theoretical limit that is imposed by the channel capacity. Specifically, turbo codes have been used in 3rd Generation (3G) and 4th Generation (4G) mobile communication standards, while Low Density Parity Check (LDPC) codes have been adopted in WiFi and satellite standards. More recently, polar codes [] have emerged, offering particularly strong error correction performance for short messages. However, polar codes are much less mature than turbo and LDPC codes, having no previous standardised applications. At the time of writing, the 3rd Generation Partnership Project (3GPP) is defining the so-called New Radio (NR) standard [2], as a candidate for 5th Generation (5G) mobile communication. Here, polar codes have been selected to provide channel coding in the control channel of the enhanced Mobile BroadBand (embb) applications of NR, as well as in the Physical Broadcast Channel (PBCH). Polar codes have also been identified as a candidate to provide channel coding for the data and control channels of the Ultra Reliable Low Latency Communication (URLLC) and massive Machine Type Communication (mmtc) applications of NR. In addition to setting a strict requirement for ultra-reliable error correction, 5G imposes a requirement for the error correction to be completed quickly, with a lower latency than in 3G or 4G. Owing to this, many 5G applications will require polar encoding and decoding to be implemented using highperformance hardware acceleration, which must consume a minimal amount of hardware resources and power consumption. Sections II and III of this white paper provide tutorials for the algorithms that underpin the processes of polar encoding and decoding, respectively. Following this, Section IV discusses the challenges of implementing these algorithms in hardware. Finally, we offer some concluding remarks in Section V. c AccelerComm 28 www.accelercomm.com

II. POLAR ENCODER A polar encoder comprises three successive components, namely information conditioning, the polar encoder kernal and encoded conditioning, as shown in Figure. These components are discussed in the following paragraphs. The input to the information conditioning component may be referred to as an information, which comprises K number of information bits, where K may be referred to as the information size. The information conditioning component interlaces the K information bits with N K redundant bits, which may be frozen bits [], Cyclical Redundancy Check (CRC) bits [3] and/or Parity Check (PC)- frozen bits [4] in the NR polar code. Here, frozen bits always adopt a value of, while CRC and PC-frozen bits adopt values that are obtained as functions of the information bits. The information conditioning component generates the redundant bits and interlaces them into positions that are identified by a prescribed method, which is also known to the polar decoder. Furthermore, the information conditioning component additionally performs code segmentation, interleaving and scrambling operations in the NR polar code, as shown in Figures 6 8. The output of the information conditioning component may be referred to as a kernal information, which comprises N number of kernal information bits, where N may be referred to as the kernal size. Here, the information conditioning must be completed such that N is a power of 2 that is greater than K. In the NR polar code, N may adopt values of up to N max = 24. Polar encoder in transmitter Information K Information conditioning Kernal information N Polar encoder kernal Kernal encoded N Encoded conditioning Encoded M Modulator Recovered information K Polar decoder in receiver Information conditioning Recovered kernal information N Polar decoder kernal Soft kernal encoded N Encoded conditioning Soft encoded M Demodulator Channel Fig. : Top-level schematic of a polar encoder and decoder. The input to the polar encoder kernal is a kernal information and its output may be referred to as a kernal encoded, which comprises N number of kernal encoded bits. The operation of the polar encoder kernal may be illustrated by a polar code graph representation, which is exemplified in Figure 2. Here, the symbol represents a binary exclusive-or (XOR) operation. Note that the graph comprises N inputs on its left edge and N outputs on its right edge, corresponding to the N kernal information bits and the N kernal encoded bits, respectively. The graph comprises log 2 (N) stages, each of which comprises N/2 vertically aligned XORs, giving a total of log 2 (N)N/2 XORs. Note that there are data dependencies between successive stages, which enforces a left to right processing schedule. More specifically, the data dependencies prevent the computation of the XORs in a particular stage until after the XORs in the stage to its left have been computed. Note that successive graph representations have recursive relationships. More specifically, the graph representation for a polar encoding kernal operation having a kernal size of N = 2 comprises a single stage, containing a single XOR. The first of the N = 2 kernal encoded bits is obtained as the c AccelerComm 28 www.accelercomm.com 2

N = 2 graph Input Output Input Output N = 4 graph Input Output Input Output Input 2 Output 2 Input 3 Output 3 N = 8 graph Input Output Input Output Input 2 Output 2 Input 3 Output 3 Input 4 Output 4 Input 5 Output 5 Input 6 Output 6 Input 7 Output 7 Stage Stage Stage 2 Fig. 2: Polar code graphs for N {2, 4, 8}. c AccelerComm 28 www.accelercomm.com 3

XOR of the N = 2 kernal information bits, while the second kernal encoded bit is equal to the second kernal information bit. For greater kernal sizes N, the graph representation may be considered to be a vertical concatenation of two graph representations for a kernal size of N/2, followed by an additional stage of XORs, as shown in Figure 2. In analogy with the N = 2 kernal described above, the first N/2 of the N kernal encoded bits are obtained as XORs of corresponding bits from the outputs of the two N/2 kernals, while the second N/2 of the kernal encoded bits are equal to the output of the second N/2 kernal. The input to the encoded conditioning component of the polar encoder is a kernal encoded and its output may be referred to as an encoded, which comprises M number of encoded bits, where M may be referred to as the encoded size. The resultant polar coding rate is given by R = K/M, where the encoded conditioning must be completed such that M is greater than K, although M may be higher or lower than N. The encoded conditioning component may use various techniques to generate the M encoded bits. More specifically, repetition [5] may be used to repeat some of the N bits in the kernal encoded, while shortening or puncturing techniques [5] may be used to remove some of the N bits in the kernal encoded. Note that shortening removes bits that are guaranteed to have values of, while puncturing removes bits that may have either of or values. In addition to this rate matching operation, the encoded conditioning component also performs sub- interleaving, channel interleaving and code concatenation operations in the NR polar code, as shown in Figures 6 8. Following polar encoding, the encoded may be provided to a modulator, which transmits it over a communication channel. The complete polar encoding process is exemplified in Figure 3, for the case where a particular arrangement of frozen bits is used to convert the K = 4 information bits [] into the M = 8 encoded bits []. Encoded bit Encoded bit Encoded bit 2 Info bit Encoded bit 3 Encoded bit 4 Info bit Encoded bit 5 Info bit 2 Encoded bit 6 Info bit 3 Encoded bit 7 Fig. 3: Example polar encoding process, using the N = 8 polar code graph, illustrating the case where a particular arrangement of frozen bits is used to convert the K = 4 information bits [] into the M = 8 encoded bits []. c AccelerComm 28 www.accelercomm.com 4

III. POLAR DECODER In the receiver, the demodulator s role is to recover information pertaining to the encoded. However, the demodulator is typically unable to obtain absolute confidence about the value of the M bits in the encoded, owing to the random nature of the noise in the communication channel. The demodulator may express its confidence about the values of the bits in the encoded by generating a soft encoded, which comprises M number of encoded soft bits. Each soft bit may be represented in the form of a Logarithmic Likelihood Ratio (LLR) LLR = ln [ Pr(bit = ) Pr(bit = ) where Pr(bit = ) and Pr(bit = ) are the probabilities that the corresponding bit has the value and, respectively. Here, a positive LLR indicates that the demodulator has greater confidence that the corresponding bit has a value of, while a negative LLR indicates greater confidence in the bit value. The magnitude of the LLR expresses how much confidence, where an infinite magnitude corresponds to absolute confidence in this bit value, while a magnitude of indicates that the demodulator has no information about whether the bit value of or is more likely. A polar decoder comprises three successive components, namely encoded conditioning, the polar decoder kernal and information conditioning, as shown in Figure. These components are discussed in the following paragraphs. The input to the encoded conditioning component of the polar decoder is a soft encoded and its output may be referred to as a soft kernal encoded, which comprises N number of kernal encoded LLRs. In order to convert the M encoded LLRs into N kernal encoded LLRs, infinite-valued LLRs may be interlaced with the soft encoded, to occupy the positions that correspond to the -valued kernal encoded bits that were removed by shortening in the polar encoder. Likewise, -valued LLRs may be interlaced with the soft encoded, to occupy the positions where kernal encoded bits were removed by puncturing. In the case of repetition, the LLRs that correspond to replicas of a particular kernal encoded bit may be summed and placed in the corresponding position within the soft kernal encoded. Additionally, the encoded conditioning component must perform the inverse of the sub- interleaving, channel interleaving and code concatenation operations in the NR polar code, as shown in Figures 6 8. The input to the polar decoder kernal is a soft kernal encoded and its output may be referred to as a recovered kernal information, which comprises N number of recovered kernal information bits. The polar decoder kernal may operate on the basis of various different algorithms, including Successive Cancellation (SC) decoding [] and Successive Cancellation List (SCL) decoding [6], which are detailed in Sections III-A and III-B, respectively. The input to the information conditioning component of the polar decoder is a recovered kernal information and its output may be referred to as a recovered information, which comprises K number of recovered information bits. The recovered information may be obtained by removing all redundant bits from the recovered kernal information. Additionally, the information conditioning component must perform the inverse of the code segmentation, interleaving and scrambling operations in the NR polar code, as shown in Figures 6 8. ], c AccelerComm 28 www.accelercomm.com 5

A. SC decoding A polar decoder kernal that operates on the basis of SC decoding may be considered to have a similar graph structure to a polar encoder, as illustrated in Figure 2. An SC decoder performs computations pertaining to the XORs in the graph, according to a sequence that is dictated by data dependencies. However, the functionality of each XOR in the graph varies, when performing operations on LLRs and at different steps in the SC decoding process. More specifically, there are three types of computations that can be performed by a particular XOR in the graph, depending on the availability of LLRs provided on the connections on its right-hand side, as well as upon the availability of bits provided on the connections on its left-hand side. The first occasion when an XOR can contribute to the SC decoding process is when an LLR has been provided by each of the connections on its right-hand side. As shown in Figure 4(a), we refer to the first and second of these two LLRs as x a and x b, respectively. This enables the XOR to compute an LLR x c for the first of the two connections on its left-hand side, according to the f function x c = f( x a, x b ) = 2 tanh (tanh( x a /2) tanh( x b /2)) () sign( x a )sign( x b ) min( x a, x b ), (2) where sign( ) returns if its argument is negative and + if its argument if positive. Here, (2) is referred to as the min-sum approximation. x c = f( x a, x b ) x a û a x a û a û c = XOR(û a, û b ) x b x d = g( x a, x b, û a ) x b û b û d = û b (a) (b) (c) Fig. 4: The three computations that can be performed for an XOR in the polar code graph: (a) the f function, (b) the g function and (c) partial sum calculation. Later in the SC decoding process, a bit û a will be provided on the first of the connections on the lefthand side of the XOR, as shown in Figure 4(b). Together with the LLRs x a and x b that were previously provided using the connections on the right-hand side, this enables the XOR to compute an LLR x d for the second of the two connections on its left-hand side, according to the g function x d = g( x a, x b, û a ) = ( )ûa x a + x b. (3) Later still, a bit û b will be provided on the second of the connections on the left-hand side of the XOR, as shown in Figure 4(c). Together with the bit û a that was previously provided using the first of the connections on the left-hand side, this enables the partial sum computation of bits û c and û d for the first and second connections on the right-hand side of the XOR, where û c = XOR(û a, û b ), (4) û d = û b. (5) As may be appreciated from the discussions above, the f function of () or (2) may be used to propagate LLRs from right-to-left within the graph, while the partial sum computations of (4) and (5) may be used c AccelerComm 28 www.accelercomm.com 6

to propagate bits from left-to-right and while the g function of (3) may be used to switch from propagating bits to propagating LLRs. In order that LLRs can be propagated from right to left, it is necessary to provide LLRs on the connections on the right-hand edge of the graph. This is performed at the start of the SC decoding process, by providing successive LLRs from the soft kernal encoded on successive connections on the right-hand edge of the graph. Likewise, it is necessary to provide bits on the connections of the lefthand edge of the graph, in order to facilitate the propagation of bits from left to right. Here, a further data dependency beyond those described above is imposed. If the position of a particular connection on the lefthand edge of the graph corresponds to the position of an information bit in the kernal information, then the bit that is input into that connection depends on the LLR that is output from that connection. More specifically, if a positive LLR is output on the connection, then a value of may be selected for the corresponding bit of the recovered kernal information and then input into the connection. Meanwhile, a negative LLR allows a value of to be selected for the corresponding bit of the recovered kernal information and then input into the connection. In the case of a connection corresponding to a redundant bit within the kernal information, the value of that redundant bit may be input into the connection as soon as it is known. Here, frozen bits always adopt the value, but the value of CRC and PC bits will not become available until related information bits have been recovered. In combination, the data dependencies described above impose a requirement for the information bits within the recovered kernal information to be obtained one at a time on the connections on the left edge of the graph, in order from top to bottom. More specifically, the SC decoding process begins by using the f function () or (2) to propagate LLRs from the right hand edge of the graph, to the top connection on the left-hand edge of the graph, allowing the first bit to be recovered. Following this, each successive bit from top to bottom is recovered by using the partial sum computations of (4) and (5) to propagate bits from left to right, then using the g function of (3) for a particular XOR to switch from bit propagation to LLR propagation, before using the f function to propagate LLRs to the next connection on the left-hand edge of the graph, allowing the corresponding bit to be recovered. This process is illustrated in the example of Figure 5. B. SCL decoding In the SC decoding process described in Section III-A, the value selected for each bit in the recovered information depends on the sign of the corresponding LLR, which in turn depends on the values selected for all previous recovered information bits. If this approach results in the selection of the incorrect value for a particular bit, then this will often result in the cascading of errors in all subsequent bits. The selection of an incorrect value for an information bit may be detected with consideration of the subsequent frozen bits, since the decoder knows that these bits should have values of. More specifically, if the corresponding LLR has a sign that would imply a value of for a frozen bit, then this suggests that an error may have been made during the decoding of one of the preceding information bits. However, in the SC decoding process, there is no opportunity to consider alternative values for the preceding information bits. Once a value has been selected for an information bit, the SC decoding process moves on and the decision is final. This motivates SCL decoding [6], which enables a list of alternative values for the information bits to be considered. As the decoding process progresses, it considers both options for the value of each successive information bit. More specifically, an SCL decoder maintains a list of candidate kernal information s, where the list and the kernal information s are built up as the SCL decoding process proceeds. At the start of the process, the list comprises only a single kernal information having a length of zero bits. Whenever the decoding process reaches a frozen bit, a bit value of is appended to the end of c AccelerComm 28 www.accelercomm.com 7

(3) +.9 (2) +.72 (4) (7) () 2.4 (4) () +2.4 Encoded LLR (5) +.8 (2) +.9 (6) (7) ().87 (4) ().87 Encoded LLR (9) +.96 (8) 3.3 () (3) ().72 (4) () +3.56 Encoded LLR 2 Info bit () 4.9 (8).96 (2) (3) ().9 (4) () +.9 Encoded LLR 3 (7) 2.2 (6) +4.28 (8) (2) (5) 5.53 () 3.2 Encoded LLR 4 Info bit (9) +2.26 (6) 2.2 (2) (2) (5) +2.2 () +.5 Encoded LLR 5 Info bit 2 (23) +.73 (22) 9.8 (24) (5) 4.28 ().72 Encoded LLR 6 Info bit 3 (25).5 (22).73 (26) (5) 2.75 () 2.66 Encoded LLR 7 Fig. 5: Example SC decoding process, using the N = 8 polar code graph, for the case where a particular arrangement of frozen bits is used to convert a particular set of M = 8 encoded LLRs into the K = 4 recovered information bits []. The LLRs obtained using the f and g functions of (2) and (3) are shown above each connection. The bits obtained using the partial sum computations of (4) and (5) are shown below each connection. The accompanying numbers in parenthesis identify the step of the SC decoding process where the corresponding LLR or bit becomes available. each candidate kernal information in the list. However, whenever the decoding process reaches an information bit, two replicas of the list of candidate kernal information s is created. Here, the bit value of is appended to each in the first replica and the bit value of is appended to each in the second replica. Following this, the two lists are merged to form a new list having a length which is double that of the original list. This continues until the length of the list reaches a limit L, which is typically chosen as a power of two. From this point onwards, each time the length of the list is doubled when considering an information bit, the worst L among the 2L candidate kernal information s are identified and pruned from the list. In this way, the length of the list is maintained at L until the SCL decoding process completes. Throughout this process, the worst candidate kernal information s are identified by comparing and sorting metrics that are computed for each [7], based on the LLRs obtained on the left-hand edge of the polar code graph. These LLRs are obtained throughout the SCL decoding process by using separate replicas of the partial sum computations of (4) and (5) to propagate the bits from each candidate kernal information into the polar code graph, from left to right. Following this, separate replicas of the g and f computations of () (3) may be used to propagate corresponding replicas of the LLRs from right to left, as in the SC decoding process described in Section III-A. The metric associated with appending c AccelerComm 28 www.accelercomm.com 8

the bit value û l,j in the position j [, N ] to the candidate kernal information l is given by φ l,j (û l,j ) = φ l,j + ln( + e ( 2û l,j) x l,j ) (6) { φl,j if û l,j = ( sign( x 2 l,j)), (7) φ l,j + x l,j otherwise where x l,j is the corresponding LLR and φ l,j is the metric that was calculated for the candidate kernal information in the previous step of the SCL decoding process. Here, (7) is referred to as the min-sum approximation. Note that since the metrics accumulate across all bit positions j [, N ], they must be calculated for all L candidate kernal information s whenever a frozen bit value of is appended, as well as for all 2L candidates when both possible values of an information bit are considered. In the latter case, the 2L metrics are sorted and L candidates having the highest values are identified as being the worst and are pruned from the list. Following the completion of the SCL decoding process, the candidate kernal information having the lowest metric may be selected as the recovered kernal information. Alternatively, in CRC-aided SCL decoding [8], all candidates in the list that do not satisfy a CRC are pruned, before the candidate having the lowest metric is selected and output. The error correction capability of the NR polar code is characterised in Figures 9. IV. CHALLENGES OF HARDWARE IMPLEMENTATION There are several challenges associated with the hardware implementation of polar encoders and, in particular, polar decoders. This section begins by discussing challenges that are common to the implementation of both polar encoders and polar decoders, before discussing additional challenges that are specific to polar decoders. Data dependencies. As described in Sections II and III, the polar encoding and decoding processes are characterised by particular data dependencies, which require the various processing operations to be completed in a particular sequence. This limits the degree of parallel processing that can be achieved during the implementation of polar encoders and decoders. This is particularly challenging in the case of polar decoders, owing to the serial nature of the SC and SCL algorithms. More specifically, the corresponding data dependencies require the kernal information bits to be recovered one after another, in order from top to bottom of the polar code graph. During the polar decoding process, the data dependencies allow different numbers of operations to be completed in parallel at different times, as illustrated in the example of Figure 5. In order to minimise the number of steps required to complete the decoding process, a large amount of hardware may used so that a single processing step is sufficient to complete the largest number of parallel operations that are supported by the decoder data dependencies. However, the data dependencies will prevent much of this hardware from being used throughout the rest of the decoding process, which may motivate the use of a smaller amount of hardware and a greater number of steps. However, either way, the ratio of hardware resource usage to the latency required to complete the decoding process may be unfavourable, unless sophisticated alternative techniques can be developed and utilised. Routing. A particular challenge in the implementation of polar encoders and decoders is routing the correct information to the correct hardware components at the correct time. As illustrated by c AccelerComm 28 www.accelercomm.com 9

the graph representations of Figure 2, the polar encoder and decoder include intricate networks of internal connections, particularly as the kernal size N becomes large. Unless sophisticated techniques for routing information around the polar code graph are developed, large interconnection networks are required to enable information to be routed between each pairing of hardware components. This is a particular challenge in the polar decoder, where partial sum bits must be routed from the left-hand edge of the graph to the computation of g functions that are distributed all over the graph, for example. Flexibility. The 5G NR polar code is required to support a wide variety of kernal sizes N, comprising up to a maximum of N max = 24 bits. This requires a compromise to be struck between providing enough hardware to complete the processing of the longest lengths with a low latency, and providing so much hardware that it cannot be fully exploited when completing the processing of the shortest lengths. Unless sophisticated techniques for managing this challenge are developed, a poor ratio of hardware resource usage to the latency required to complete the decoding process will result for either the short or the long lengths. Interlacing. As described in Sections II and III, the conditioning components of the polar encoder and decoder are required to insert or remove bits in the various s, in order to transform between sizes of K, N and M. Here, the specific positions of the inserted or removed bits depend on the particular combination of K, N and M, requiring the use of very flexible interlacer and deinterlacer circuits, which must be capable of inserting or removing an arbitrary number of bits in arbitrary positions within the various s. Here, sophisticated techniques are required in order to facilitate hardware efficient conditioning with low latency. Complicated conditioning. The information conditioning and encoded conditioning employed in the NR polar code is very complicated, since it includes code segmentation, CRC attachment, CRC interleaving, CRC scrambling, PC and frozen bit insertion, sub- interleaving, rate matching, channel interleaving and code concatenation, as shown in Figures 6 8. Furthermore, there are intricate interdependencies between these operations, where the frozen bit insertion process in the information conditioning is dependent on the rate matching operation in the encoded conditioning, for example. In contrast to other channel codes, where the various information and encoded conditioning operations can be completed separately, using independent processing s, the NR polar code requires its processing s to be tightly coupled together in order to maximise the achievable performance. The following challenges are specific to the implementation of polar decoders. Decoder complexity. The complexity of a polar decoder is much greater than that of a polar encoder for three reasons. Firstly, while polar encoders operate on the basis of bits, polar decoders operate on the basis of the probabilities of bits, which require more memory to store and more complex computations. Secondly, while polar encoders only have to consider the particular permutation of the information that they are presented with, polar decoders must consider all possible permutations of the information and must select that which is most likely. Finally, while polar encoders only process each information once, an SCL polar decoder must process each information L number of times, in order to achieve sufficiently strong error correction. For these reasons, the latency, hardware resource usage and power consumption c AccelerComm 28 www.accelercomm.com

of polar decoders are typically orders of magnitude greater than those of polar encoders. Copy. As described in Section III-B, the SCL decoding process creates replicas of the list of candidate kernal information s, as well as all associated intermediate LLRs and bits. However, copying this large amount of information within a hardware implementation imposes particular challenges for the implementation of the memory architecture. One option is to employ memory s having very large bandwidths, allowing the copy process to be completed within a small number of steps. Alternatively, the copy process could be completed over many steps, requiring only a moderate memory bandwidth. However, either way, the ratio of hardware resource usage to the latency required to complete the decoding process is unfavourable, unless sophisticated alternative techniques can be developed and utilised. This challenge is particularly important, since the hardware resource usage of polar decoders is typically dominated by memory. Sort. Another key challenge in the implementation of the SCL decoding process is imposed by metric sorting. As described in Section III-B, this sort is required in order to identify and prune the worst L candidate kernal information s, among the merged list of 2L candidates. One option is to employ a large amount of hardware to simultaneously compare every one of the 2L candidates with every other one of the candidates, so that the sorting can be completed within a short latency. Alternatively, the hardware resource requirement can be reduced by structuring successive comparisons to efficiently reuse intermediate results, at the cost of increasing the latency required to rank the 2L candidates. However, either way, the ratio of hardware resource usage to the latency required to complete the decoding process is unfavourable, unless sophisticated alternative techniques can be developed and utilised. CRC integration. CRC bits are employed by the NR polar code in order to facilitate error detection and also to improve the error correction capability of the polar decoder. However, there is a tradeoff between the error detection capability and the error correction capability. In order to meet the error detection reliability requirements of NR, the CRC bits must be handled very carefully, in a manner which is not captured in the NR standards. In particular, the CRC (and PC) bits must be decoded as an integral part of the polar decoding process, using an unconventional decoding technique. This is in contrast to conventional CRCs, which may be decoded separately from other channel codes, in independent processing s, leading to a much simpler implementation. V. CONCLUSIONS In this white paper, we have discussed the selection of polar codes in the 5G NR standard and have provided tutorials on the polar encoding and decoding processes, paying particular attention to the SC and SCL decoding algorithms. Furthermore, we have discussed the challenges associated with the hardware implementation of polar encoder and decoders, noting that these challenges are particularly great in the case of the polar decoder, since its complexity is orders of magnitude greater than that of the polar encoder. At AccelerComm, we have been researching polar codes since they were first published in 29. We have drawn upon our expertise and intuition for polar codes in order to develop polar encoder and decoder solutions that address all of the challenges described in this white paper. We offer patent-pending firstto-market polar encoder and decoder Intellectual Property (IP) which allow all of the 5G requirements to be met in Field Programmable Gate Array (FPGA) and Application Specific Integrated Circuit (ASIC) implementations. More specifically, we have developed sophisticated solutions that overcome all of the c AccelerComm 28 www.accelercomm.com

ACCELERCOMM WHITE PAPER: THE IMPLEMENTATION CHALLENGES OF POLAR CODES challenges described in Section IV, offering much greater flexibility, error correction capability and hardware efficiency than all previously published implementations of polar encoders and decoders. R EFERENCES [] E. Arikan, Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels, IEEE Transactions on Information Theory, vol. 55, no. 7, pp. 35 373, July 29. [2] 3rd Generation Partnership Project; Technical Specification Group Radio Access Network; NR; Multiplexing and channel coding (Release 5), 3GPP Std. TS 38.22, Rev. 5.., December 27. [3] K. Niu and K. Chen, Crc-aided decoding of polar codes, IEEE Communications Letters, vol. 6, no., pp. 668 67, October 22. [4] Huawei, HiSilicon, Polar code construction for NR, in 3GPP TSG RAN WG Meeting #86bis, Lisbon, Portugal, October 26, R-68862. [5] ZTE, ZTE Microelectronics, Rate matching of polar codes for embb, in 3GPP TSG RAN WG Meeting #88, Athens, Greece, February 27, R-762. [6] I. Tal and A. Vardy, List decoding of polar codes, in 2 IEEE International Symposium on Information Theory Proceedings, July 2, pp. 5. [7] A. Balatsoukas-Stimming, M. B. Parizi, and A. Burg, Llr-based successive cancellation list decoding of polar codes, IEEE Transactions on Signal Processing, vol. 63, no. 9, pp. 565 579, Oct 25. [8] K. Niu and K. Chen, Crc-aided decoding of polar codes, IEEE Communications Letters, vol. 6, no., pp. 668 67, October 22. [9] T. Erseghe, Coding in the finite-length regime: Bounds based on Laplace integrals and their asymptotic approximations, IEEE Transactions on Information Theory, vol. 62, no. 2, pp. 6854 6883, December 26. Prof. Robert G. Maunder is an industry authority on error correction and channel coding. As a professor at the University of Southampton, he built a team of experts and published over IEEE papers and resources on the joint design of algorithms and hardware implementations for error correction, including turbo, LDPC and polar codes. This expertise is being leveraged by Prof Maunder s founding of AccelerComm, which is a semiconductor IP-core company specialising in patent-pending channel coding solutions. c AccelerComm 28 www.accelercomm.com 2

PBCH encoder Determination of higher layer parameters Key: (x.x.x.x) Section of TS38.22 PBCH payload generation PBCH payload interleaving PBCH payload scrambling CRC24C attachment CRC interleaving insertion Polar encoding Sub- interleaving Rate matching Multiplexing onto PBCH (7..) (7..) (7..2) (7..3) (7..4) (7..4) (7..4) (7..5) (7..5) PBCH decoder Determination of higher layer parameters Determination of known bits PBCH payload extraction PBCH payload deinterleaving PBCH payload descrambling Distributed-CRC-aided SCL polar decoding Sub- deinterleaving Rate dematching Demultiplexing from PBCH Fig. 6: Block diagram of the polar encoder and decoder employed by the Public Broadcast Channel (PBCH) of 3GPP New Radio. c AccelerComm 28 www.accelercomm.com 3

PDCCH encoder Determination of RNTI Determination of encoded length Key: (x.x.x.x) Section of TS38.22 DCI bit sequence generation Zero padding Ones-initialised CRC24C attachment CRC scrambling CRC interleaving insertion Polar encoding Sub- interleaving Rate matching Multiplexing onto PDCCH (7.3.) (7.3.) (7.3.2) (7.3.2) (7.3.3) (7.3.3) (7.3.3) (7.3.4) (7.3.4) PDCCH decoder Determination of RNTI Determination of information length DCI bit sequence extraction Distributed-CRC-aided SCL polar decoding Sub- deinterleaving Rate dematching Demultiplexing from PDCCH Fig. 7: Block diagram of the polar encoder and decoder employed by the Physical Downlink Control Channel (PDCCH) of 3GPP New Radio. c AccelerComm 28 www.accelercomm.com 4

PUCCH/PUSCH encoder Determination of encoded length Key: (x.x.x.x) Section of TS38.22 (6.3..4./6.3.2.4.) K [2, 76] Code segmentation CRC6 or CRC attachment PC and frozen bit insertion Polar encoding Sub- interleaving Rate matching Channel interleaving Code concatenation UCI bit sequence generation (6.3..2./6.3.2.2.) (6.3..2./6.3.2.2.) (6.3..3./6.3.2.3.) (6.3..3./6.3.2.3.) (6.3..4./6.3.2.4.) (6.3..4./6.3.2.4.) (6.3..4./6.3.2.4.) (6.3..5/6.3.2.5) Multiplexing onto PUCCH/ PUSCH (6.3../6.3.2.) K [, ] Short encoding Rate matching (6.3..6/6.3.2.6) (6.3..3.2/6.3.2.3.2) (6.3..4.2/6.3.2.4.2) Identical to LTE PUCCH/PUSCH decoder Determination of information length K [2, 76] Code concatenation PC/CRC-aided SCL polar decoding Sub- deinterleaving Rate dematching Channel deinterleaving Code segmentation UCI bit sequence extraction Demultiplexing from PUCCH/ PUSCH K [, ] Short decoding Rate dematching Identical to LTE Fig. 8: Block diagram of the polar encoder and decoder employed by the Physical Uplink Control Channel (PUCCH) of 3GPP New Radio. c AccelerComm 28 www.accelercomm.com 5

BLER ACCELERCOMM WHITE PAPER: THE IMPLEMENTATION CHALLENGES OF POLAR CODES PBCH polar code, K = 32, M = 864, QPSK, AWGN - L= L=2 L=4 L=8 L=6 L=32 capacity -2-3 -2 - - -9-8 -7-6 -5 E s /N [db] Fig. 9: Plot of Block Error Rate (BLER) versus channel Signal to Noise Ratio (SNR) E s /N for the Public Broadcast Channel (PBCH) polar code of 3GPP New Radio, when using Quadrature Phase Shift Keying (QPSK) for communication over an Additive White Gaussian Noise (AWGN) channel. Here, K is the number of bits in each information, M is the number of bits in each encoded and L is the list size used during min-sum Successive Cancellation List (SCL) decoding. The simulation of each SNR was continued until errors were observed. Capacity plots are provided by the O(n 2 ) metaconverse PPV upper bound [9]. c AccelerComm 28 www.accelercomm.com 6

Required E s /N [db] ACCELERCOMM WHITE PAPER: THE IMPLEMENTATION CHALLENGES OF POLAR CODES PDCCH polar code, BLER =., QPSK, AWGN 5-5 - M=8, L=8 M=26, L=8 M=432, L=8 M=864, L=8 M=728, L=8 M=8, L=6 M=26, L=6 M=432, L=6 M=864, L=6 M=728, L=6 M=8, capacity M=26, capacity M=432, capacity M=864, capacity M=728, capacity -5 8 6 32 64 28 K Fig. : Plot of Signal to Noise Ratio (SNR) E s /N required to achieve a Block Error Rate (BLER) of 3 versus number bits in each information K for the Physical Downlink Control Channel (PDCCH) polar code of 3GPP New Radio, when using Quadrature Phase Shift Keying (QPSK) for communication over an Additive White Gaussian Noise (AWGN) channel. Here, M is the number of bits in each encoded and L is the list size used during min-sum Successive Cancellation List (SCL) decoding. The simulation of each SNR was continued until errors were observed. Capacity plots are provided by the O(n 2 ) metaconverse PPV upper bound [9]. c AccelerComm 28 www.accelercomm.com 7

Required E s /N [db] ACCELERCOMM WHITE PAPER: THE IMPLEMENTATION CHALLENGES OF POLAR CODES 5 PUCCH polar code, BLER =., QPSK, AWGN 5-5 - -5-2 M=54, L=8 M=8, L=8 M=26, L=8 M=432, L=8 M=864, L=8 M=728, L=8 M=3456, L=8 M=692, L=8 M=3824, L=8 M=54, L=6 M=8, L=6 M=26, L=6 M=432, L=6 M=864, L=6 M=728, L=6 M=3456, L=6 M=692, L=6 M=3824, L=6 M=54, capacity M=8, capacity M=26, capacity M=432, capacity M=864, capacity M=728, capacity M=3456, capacity M=692, capacity M=3824, capacity -25 8 6 32 64 28 256 52 24 248 K Fig. : Plot of Signal to Noise Ratio (SNR) E s /N required to achieve a Block Error Rate (BLER) of 3 versus number bits in each information K for the Physical Uplink Control Channel (PUCCH) polar code of 3GPP New Radio, when using Quadrature Phase Shift Keying (QPSK) for communication over an Additive White Gaussian Noise (AWGN) channel. Here, M is the number of bits in each encoded and L is the list size used during min-sum Successive Cancellation List (SCL) decoding. The simulation of each SNR was continued until errors were observed. Capacity plots are provided by the O(n 2 ) metaconverse PPV upper bound [9]. c AccelerComm 28 www.accelercomm.com 8