An Improved Hardware Implementation of the Grain-128a Stream Cipher

Similar documents
Testing of Cryptographic Hardware

VLSI System Testing. BIST Motivation

Design of Fault Coverage Test Pattern Generator Using LFSR

Optimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015

VARIABLE FREQUENCY CLOCKING HARDWARE

LFSRs as Functional Blocks in Wireless Applications Author: Stephen Lim and Andy Miller

Overview: Logic BIST

CSE 352 Laboratory Assignment 3

Bubble Razor An Architecture-Independent Approach to Timing-Error Detection and Correction

On the Rules of Low-Power Design

Asynchronous (Ripple) Counters

Outcomes. Spiral 1 / Unit 6. Flip-Flops FLIP FLOPS AND REGISTERS. Flip-flops and Registers. Outputs only change once per clock period

VLSI Test Technology and Reliability (ET4076)

Retiming Sequential Circuits for Low Power

6.S084 Tutorial Problems L05 Sequential Circuits

CS3350B Computer Architecture Winter 2015

Universal Asynchronous Receiver- Transmitter (UART)

Understanding Cryptography A Textbook for Students and Practitioners by Christof Paar and Jan Pelzl. Chapter 2 Stream Ciphers ver.

Understanding Cryptography A Textbook for Students and Practitioners by Christof Paar and Jan Pelzl. Chapter 2 Stream Ciphers ver.

Why FPGAs? FPGA Overview. Why FPGAs?

Ultra-lightweight 8-bit Multiplicative Inverse Based S-box Using LFSR

Instructions. Final Exam CPSC/ELEN 680 December 12, Name: UIN:

EFFICIENT IMPLEMENTATION OF RECENT STREAM CIPHERS ON RECONFIGURABLE HARDWARE DEVICES

From Theory to Practice: Private Circuit and Its Ambush

Modeling Digital Systems with Verilog

A New Proposed Design of a Stream Cipher Algorithm: Modified Grain - 128

Clock Domain Crossing. Presented by Abramov B. 1

LFSR Based Watermark and Address Generator for Digital Image Watermarking SRAM

EECS150 - Digital Design Lecture 3 Synchronous Digital Systems Review. Announcements

Efficient Realization for A Class of Clock-Controlled Sequence Generators

ISSCC 2003 / SESSION 19 / PROCESSOR BUILDING BLOCKS / PAPER 19.5

Timing Error Detection: An Adaptive Scheme To Combat Variability EE241 Final Report Nathan Narevsky and Richard Ott {nnarevsky,

ECE 172 Digital Systems. Chapter 2.2 Review: Ring Counter, Johnson Counter. Herbert G. Mayer, PSU Status 7/14/2018

Figure 1 shows a simple implementation of a clock switch, using an AND-OR type multiplexer logic.

Logic and Computer Design Fundamentals. Chapter 7. Registers and Counters

Designing Integrated Accelerator for Stream Ciphers with Structural Similarities

DESIGN and IMPLETATION of KEYSTREAM GENERATOR with IMPROVED SECURITY

(CSC-3501) Lecture 7 (07 Feb 2008) Seung-Jong Park (Jay) CSC S.J. Park. Announcement

Prototyping an ASIC with FPGAs. By Rafey Mahmud, FAE at Synplicity.

ECE321 Electronics I

CS 110 Computer Architecture. Finite State Machines, Functional Units. Instructor: Sören Schwertfeger.

Asynchronous IC Interconnect Network Design and Implementation Using a Standard ASIC Flow

Department of Electrical and Computer Engineering University of Wisconsin Madison. Fall Final Examination CLOSED BOOK

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath

Design Project: Designing a Viterbi Decoder (PART I)

More on Flip-Flops Digital Design and Computer Architecture: ARM Edition 2015 Chapter 3 <98> 98

Design of Test Circuits for Maximum Fault Coverage by Using Different Techniques

CS61C : Machine Structures

ECEN620: Network Theory Broadband Circuit Design Fall 2014

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

UC Berkeley CS61C : Machine Structures

Registers. Unit 12 Registers and Counters. Registers (D Flip-Flop based) Register Transfers (example not out of text) Accumulator Registers

VLSI Design Verification and Test BIST II CMPE 646 Space Compaction Multiple Outputs We need to treat the general case of a k-output circuit.

Introduction to Microprocessor & Digital Logic

Final Exam CPSC/ECEN 680 May 2, Name: UIN:

YEDITEPE UNIVERSITY DEPARTMENT OF COMPUTER ENGINEERING. EXPERIMENT VIII: FLIP-FLOPS, COUNTERS 2014 Fall

True Random Number Generation with Logic Gates Only

Digital System Design

Lecture 23 Design for Testability (DFT): Full-Scan

Long and Fast Up/Down Counters Pushpinder Kaur CHOUHAN 6 th Jan, 2003

Power-Optimal Pipelining in Deep Submicron Technology

EE141-Fall 2010 Digital Integrated Circuits. Announcements. Homework #8 due next Tuesday. Project Phase 3 plan due this Sat.

Experiment 8 Introduction to Latches and Flip-Flops and registers

CS150 Fall 2012 Solutions to Homework 4

Testing Digital Systems II

CS61C : Machine Structures

L5 Sequential Circuit Design

RAZOR: CIRCUIT-LEVEL CORRECTION OF TIMING ERRORS FOR LOW-POWER OPERATION

Stream Ciphers. Debdeep Mukhopadhyay

Counters

Design and Implementation of Signal Processing Systems: An Introduction

Efficient Architecture for Flexible Prescaler Using Multimodulo Prescaler

Testability: Lecture 23 Design for Testability (DFT) Slide 1 of 43

Sequential Circuit Design: Part 1

Data Converters and DSPs Getting Closer to Sensors

DIGITAL REGISTERS. Serial Input Serial Output. Block Diagram. Operation

More Digital Circuits

International Journal of Scientific & Engineering Research, Volume 5, Issue 9, September ISSN

Adaptive Fir Filter with Optimised Area and Power using Modified Inner-Product Block

Sequential Circuit Design: Part 1

Cryptanalysis of LILI-128

AN EFFICIENT LOW POWER DESIGN FOR ASYNCHRONOUS DATA SAMPLING IN DOUBLE EDGE TRIGGERED FLIP-FLOPS

Dynamic Power Reduction in Sequential Circuits Using Look Ahead Clock Gating Technique R. Manjith, C. Muthukumari

Reconfigurable FPGA Implementation of FIR Filter using Modified DA Method

Random Access Scan. Veeraraghavan Ramamurthy Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL

Level and edge-sensitive behaviour

The University of Texas at Dallas Department of Computer Science CS 4141: Digital Systems Lab

hochschule fu r angewandte wissenschaften hamburg Prof. Dr. B. Schwarz FB Elektrotechnik/Informatik

DIGITAL TECHNICS. Dr. Bálint Pődör. Óbuda University, Microelectronics and Technology Institute

Chapter 1: Switching Algebra Chapter 2: Logical Levels, Timing & Delays. Introduction to latches Chapter 9: Binary Arithmetic

DEDICATED TO EMBEDDED SOLUTIONS

NH 67, Karur Trichy Highways, Puliyur C.F, Karur District UNIT-III SEQUENTIAL CIRCUITS

Attacking of Stream Cipher Systems Using a Genetic Algorithm

A clock is a free-running signal with a cycle time. A clock may be either high or low, and alternates between the two states.

Reconfigurable Architectures. Greg Stitt ECE Department University of Florida

Flip Flop. S-R Flip Flop. Sequential Circuits. Block diagram. Prepared by:- Anwar Bari

Introduction. NAND Gate Latch. Digital Logic Design 1 FLIP-FLOP. Digital Logic Design 1

Flip-Flops. Because of this the state of the latch may keep changing in circuits with feedback as long as the clock pulse remains active.

VLSI Technology used in Auto-Scan Delay Testing Design For Bench Mark Circuits

Bit Swapping LFSR and its Application to Fault Detection and Diagnosis Using FPGA

Transcription:

An Improved Hardware Implementation of the Grain-128a Stream Cipher Shohreh Sharif Mansouri and Elena Dubrova Department of Electronic Systems Royal Institute of Technology (KTH), Stockholm Email:{shsm,dubrova}@kth.se

Overview Motivation Structure of Grain-128a 4 techniques to improve implementation Experimental results Conclusion 2

The Main Goal Improving Grain-128a in terms of Throughput, Area and Power We achieve it by modifying the architecture of Grain without changing its algorithm We succeed to increase the throughput by 52% on average 3

Grain Family of Stream Ciphers Support 80-bits-key and 128-bits-key algorithms Support 4, 8, 16 and 32 (for Grain-128) degrees of parallelization New version of Grain-128 (Grain-128a) supports authentication, with a maximal tag length of 32 bits 4

Grain-128a The cipher is divided into two parts: Keystream generator section Authentication section 5

The cipher goes through the following phases: loading with the key and the initial value (IV) Keystream initialization phase Keystream generation phase Authentication initialization phase Operational phase key IV Cipher Phases 64 clock cycles Initializing the accumulator and the authentication shift register 256 clock cycles No output bits Output stream tag 6 Producing output stream and tag

How to Improve Throughput? Throughput is determined by the critical path, which is the longest combinational path in the system 5 potential candidates to critical path: Dn: maximal delay from any NLFSR flip-flop to any other NLFSR flip-flop Dhy: maximal delay from any NLFSR or LFSR flip-flop through the h and y functions to the output of the cipher Dhya: maximal delay from any NLFSR or LFSR flip-flop through the h and y functions to any accumulator flip-flop Da: maximal delay from any flip-flop in the authentication section of the cipher to any accumulator flip-flop Dhyn: maximal delay from a flip-flop of the NLFSR or LFSR through the h and y functions to the first flip-flop of the NLFSR Dn Dl Dhya Da Only during initialization phase Dhy 7

Our Approach We apply four techniques: Isolation of the authentication section (improving Dhya) Fibonacci-to-Galois transformation of shift registers (improving Dn) Multi-frequency implementation (improving Dhyn) Internal pipelining (improving Dhy) 8

Isolation of the authentication section Fibonacci-to-Galois transformation of the feedback shift registers Multi-frequency implementation Internal pipelining Our Approach 9

1. Isolation of the Auth. Section Problem: Dhya increases as the degree of parallelization of Grain- 128a grows Possible solution: Isolation of the authentication section by inserting flipflops in the authentication section on the outputs of the h/y function This solution: Adds one cycle latency Has no effect on security Dhya Da ff ff ff Dhy 10

Isolation of the authentication section Fibonacci-to-Galois transformation of the feedback shift registers Multi-frequency implementation Internal pipelining Our Approach 11

2. Fibonacci to Galois Transformation Improves Dhyn and Dn Brings no area or power penalty Dn Dl Dhyn 12

Fibonacci to Galois Transformation* Fibonacci Configuration Galois Configuration 2 1 2 1 delay=5 delay=3 Critical delay=5 delay=3 delay=3 Critical delay=3 f3=x0 + x2x3 +x1x2 f2=x3 f1=x2 f0=x1 f3=x0 + x2x3 f2=x3 +x0x1 f1=x2 f0=x1 *A Transformation from the Fibonacci to the Galois NLFSRs", E. Dubrova,IEEE Transactions on Information Theory, 55:11, 2009, pp. 5263-5271 13

Example The transformation from Fibonnacci to Galois is not unique f 3 = x 1 x 2 + x 1 x 3 + x 0 f 2 = x 3 f 1 = x 2 f 0 = x 1 f 3 = x 1 x 2 + x 0 f 2 = x 3 + x 0 x 2 f 1 = x 2 f 0 = x 1 f 3 = x 0 f 2 = x 3 + x 0 x 1 + x 0 x 2 f 1 = x 2 f 0 = x 1 14

Fibonacci to Galois Transformation Explore the design space to find the best Galois NLFSR equivalent to a given Fibonacci NLFSR Optimal algorithm: synthesize every possible combination and find the best solution Computationally unfeasible - we need a heuristic approach* *"An Algorithm for Constructing a Fastest Galois NLFSR Generating a Given Sequence, J.-M.,Chabloz, S. Mansouri, E. Dubrova, in Sequences and Their Applications, LNCS 6338, 2010, pp. 41-55 15

Improvement on Dl and Dn Dn Dl Highest improvement is achieved on Dn of Grain-128a x 1 X1 X2 X4 X8 X16 X32 LFSR NLFSR LFSR NLFSR LFSR NLFSR LFSR NLFSR LFSR NLFSR LFSR NLFSR 60% 67% 53% 54% 51% 42% 26% 24% 18% 13% 0% 0% 16

Isolation of the authentication section Fibonacci-to-Galois transformation of the feedback shift registers Multi-frequency implementation Internal pipelining Our Approach 17

3. Multi-Frequency Implementation The critical paths for all versions of grain-128a are given by Dhyn Although transforming the Grain s configuration improves the delays (Dn and Dl) up to 67 %, the clock frequency of the overall Grain cipher improves only about 10% Dhyn is active only during the keystream initialization phase To support efficiently both the initialization and key generation phases, we suggest a dual-frequency implementation of Grain-128a Dhyn 18

Multi-Frequency Implementation Grain128a phases: Keystream initialization phase (Dhyn path) Keystream generation phase (Dn path) Clock Divider Block Multi-frequency based Grain128a: The cipher receives only one external clock signal (fast clk) Slow clock is made by clock divider from fast clock Slower clock used during the keystream initialization phase Faster clock used during the keystream generation phase initialization phase initialization phase generation phase generation phase 19

Isolation of the authentication section Fibonacci-to-Galois transformation of the feedback shift registers Multi-frequency implementation Internal pipelining Our Approach 20

4. Internal Pipelining The h/y function is pipelined during the key generation phase Advantage: Cipher frequency is improved during key generation phase Disadvatage: Pipeline flip-flops overhead Increase the latency of a fixed number of cycles during the key generation phase Dhya Dhyn Dhy 21

Throughput Improvement Maximal improvement in frequency compared to the original design. Grain-128a X1 Grain-128a X2 Grain-128a X4 Grain-128a X8 Grain-128a X16 Grain-128a X32 Freq. 67% 80% 65% 53% 32% 41% 29% 40% 33% Area 0% -5% -7% -10% -1% -12-5 % -23% -13% Power 3-7 -13-2% -4-11 1% 4% 1 More information about different tradeoffs can be found in the paper 22

Conclusion High throughput improvement Limited area/power impact Techniques compatible with the standard ASIC flow Some techniques can be applied to other ciphers 23

Thank You for your attention Questions? F2G: http://web.it.kth.se/~dubrova/fib2gal.html