From Theory to Practice: Private Circuit and Its Ambush

Similar documents
LFSRs as Functional Blocks in Wireless Applications Author: Stephen Lim and Andy Miller

VLSI Technology used in Auto-Scan Delay Testing Design For Bench Mark Circuits

Memory Interfaces Data Capture Using Direct Clocking Technique Author: Maria George

data and is used in digital networks and storage devices. CRC s are easy to implement in binary

Why FPGAs? FPGA Overview. Why FPGAs?

Testing of Cryptographic Hardware

Memory Interfaces Data Capture Using Direct Clocking Technique Author: Maria George

Optimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015

Asynchronous IC Interconnect Network Design and Implementation Using a Standard ASIC Flow

Self-Test and Adaptation for Random Variations in Reliability

Clock Gating Aware Low Power ALU Design and Implementation on FPGA

EECS150 - Digital Design Lecture 3 Synchronous Digital Systems Review. Announcements

DEDICATED TO EMBEDDED SOLUTIONS

Performance Evolution of 16 Bit Processor in FPGA using State Encoding Techniques

Digital System Design

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath

International Journal of Scientific & Engineering Research, Volume 5, Issue 9, September ISSN

Retiming Sequential Circuits for Low Power

Sequential Circuit Design: Principle

FPGA Design with VHDL

Midterm Exam 15 points total. March 28, 2011

Laboratory 1 - Introduction to Digital Electronics and Lab Equipment (Logic Analyzers, Digital Oscilloscope, and FPGA-based Labkit)

Research Article Ring Counter Based ATPG for Low Transition Test Pattern Generation

1. What does the signal for a static-zero hazard look like?

Low Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

An Improved Hardware Implementation of the Grain-128a Stream Cipher

Peak Dynamic Power Estimation of FPGA-mapped Digital Designs

Lossless Compression Algorithms for Direct- Write Lithography Systems

Design of Fault Coverage Test Pattern Generator Using LFSR

Guidance For Scrambling Data Signals For EMC Compliance

Available online at ScienceDirect. Procedia Computer Science 46 (2015 ) Aida S Tharakan a *, Binu K Mathew b

Understanding Cryptography A Textbook for Students and Practitioners by Christof Paar and Jan Pelzl. Chapter 2 Stream Ciphers ver.

FPGA Based Implementation of Convolutional Encoder- Viterbi Decoder Using Multiple Booting Technique

A clock is a free-running signal with a cycle time. A clock may be either high or low, and alternates between the two states.

EECS150 - Digital Design Lecture 10 - Interfacing. Recap and Topics

LUT Optimization for Distributed Arithmetic-Based Block Least Mean Square Adaptive Filter

Understanding Cryptography A Textbook for Students and Practitioners by Christof Paar and Jan Pelzl. Chapter 2 Stream Ciphers ver.

University of California at Berkeley College of Engineering Department of Electrical Engineering and Computer Science. EECS150, Spring 2011

Logic and Computer Design Fundamentals. Chapter 7. Registers and Counters

Static Timing Analysis for Nanometer Designs

Stream Ciphers. Debdeep Mukhopadhyay

EFFICIENT IMPLEMENTATION OF RECENT STREAM CIPHERS ON RECONFIGURABLE HARDWARE DEVICES

True Random Number Generation with Logic Gates Only

Field Programmable Gate Arrays (FPGAs)

OL_H264e HDTV H.264/AVC Baseline Video Encoder Rev 1.0. General Description. Applications. Features

BUSES IN COMPUTER ARCHITECTURE

Fault Detection And Correction Using MLD For Memory Applications

DESIGN and IMPLETATION of KEYSTREAM GENERATOR with IMPROVED SECURITY

An Efficient High Speed Wallace Tree Multiplier

Fully Pipelined High Speed SB and MC of AES Based on FPGA

EN2911X: Reconfigurable Computing Topic 01: Programmable Logic. Prof. Sherief Reda School of Engineering, Brown University Fall 2014

Comparative Analysis of Stein s. and Euclid s Algorithm with BIST for GCD Computations. 1. Introduction

6.3 Sequential Circuits (plus a few Combinational)

AC103/AT103 ANALOG & DIGITAL ELECTRONICS JUN 2015

VLSI IEEE Projects Titles LeMeniz Infotech


FPGA Design. Part I - Hardware Components. Thomas Lenzi

Overview: Logic BIST

ECE532 Digital System Design Title: Stereoscopic Depth Detection Using Two Cameras. Final Design Report

Tutorial 11 ChipscopePro, ISE 10.1 and Xilinx Simulator on the Digilent Spartan-3E board

1. Convert the decimal number to binary, octal, and hexadecimal.

EECS150 - Digital Design Lecture 15 Finite State Machines. Announcements

Testing Results for a Video Poker System on a Chip

Keywords Xilinx ISE, LUT, FIR System, SDR, Spectrum- Sensing, FPGA, Memory- optimization, A-OMS LUT.

Changing the Scan Enable during Shift

Logic Analysis Basics

Clock Domain Crossing. Presented by Abramov B. 1

Design for Testability Part II

Logic Analysis Basics

COMP2611: Computer Organization. Introduction to Digital Logic

OL_H264MCLD Multi-Channel HDTV H.264/AVC Limited Baseline Video Decoder V1.0. General Description. Applications. Features

cascading flip-flops for proper operation clock skew Hardware description languages and sequential logic

Design and Implementation of Uart with Bist for Low Power Dissipation Using Lp-Tpg

PROCESSOR BASED TIMING SIGNAL GENERATOR FOR RADAR AND SENSOR APPLICATIONS

Power Optimization by Using Multi-Bit Flip-Flops

A Fast Constant Coefficient Multiplier for the XC6200

Figure.1 Clock signal II. SYSTEM ANALYSIS

CSE 352 Laboratory Assignment 3

RELATED WORK Integrated circuits and programmable devices

Universal Asynchronous Receiver- Transmitter (UART)

COPY RIGHT. To Secure Your Paper As Per UGC Guidelines We Are Providing A Electronic Bar Code

More on Flip-Flops Digital Design and Computer Architecture: ARM Edition 2015 Chapter 3 <98> 98

Strategies for Efficient and Effective Scan Delay Testing. Chao Han

Digital Electronics II 2016 Imperial College London Page 1 of 8

Universidad Carlos III de Madrid Digital Electronics Exercises

ECE 4220 Real Time Embedded Systems Final Project Spectrum Analyzer

T1 Deframer. LogiCORE Facts. Features. Applications. General Description. Core Specifics

Further Details Contact: A. Vinay , , #301, 303 & 304,3rdFloor, AVR Buildings, Opp to SV Music College, Balaji

ARM7 Microcontroller Based Digital PRBS Generator

Synchronous Sequential Design

Available online at ScienceDirect. Procedia Technology 24 (2016 )

Analysis of Low Power Test Pattern Generator by Using Low Power Linear Feedback Shift Register (LP-LFSR)

Chapter 4. Logic Design

Design and Implementation of Data Scrambler & Descrambler System Using VHDL

Asynchronous (Ripple) Counters

DC Ultra. Concurrent Timing, Area, Power and Test Optimization. Overview

for Digital IC's Design-for-Test and Embedded Core Systems Alfred L. Crouch Prentice Hall PTR Upper Saddle River, NJ

UNIT IV CMOS TESTING. EC2354_Unit IV 1

Bit Swapping LFSR and its Application to Fault Detection and Diagnosis Using FPGA

Testing Sequential Circuits

Transcription:

Indian Institute of Technology Kharagpur Telecom ParisTech From Theory to Practice: Private Circuit and Its Ambush Debapriya Basu Roy, Shivam Bhasin, Sylvain Guilley, Jean-Luc Danger and Debdeep Mukhopadhyay 20/01/2015 Debapriya Basu Roy, Weekly Presentation 1/20

Introduction Side Channel: Information leakage from the implementation 20/01/2015 Debapriya Basu Roy, Weekly Presentation 2/20

Introduction Side Channel: Information leakage from the implementation Probing Attack 20/01/2015 Debapriya Basu Roy, Weekly Presentation 2/20

Introduction Side Channel: Information leakage from the implementation Probing Attack Power attack and EM radiation attack 20/01/2015 Debapriya Basu Roy, Weekly Presentation 2/20

Introduction Side Channel: Information leakage from the implementation Probing Attack Power attack and EM radiation attack t-private Circuit: Countermeasure design with sound theoretical proof 20/01/2015 Debapriya Basu Roy, Weekly Presentation 2/20

Introduction Side Channel: Information leakage from the implementation Probing Attack Power attack and EM radiation attack t-private Circuit: Countermeasure design with sound theoretical proof Overhead: O(nt 2 ), n is the number of gates in the circuit. 20/01/2015 Debapriya Basu Roy, Weekly Presentation 2/20

Related Works Masking: by product of private circuit to protect against first order differential attacks. 20/01/2015 Debapriya Basu Roy, Weekly Presentation 3/20

Related Works Masking: by product of private circuit to protect against first order differential attacks. Reducing Overhead: From O(nt 2 ) to O(nt), further reducing it to t/2. 20/01/2015 Debapriya Basu Roy, Weekly Presentation 3/20

Related Works Masking: by product of private circuit to protect against first order differential attacks. Reducing Overhead: From O(nt 2 ) to O(nt), further reducing it to t/2. Designing block ciphers with reduced number of AND operations, for example: PICARO. 20/01/2015 Debapriya Basu Roy, Weekly Presentation 3/20

Related Works Masking: by product of private circuit to protect against first order differential attacks. Reducing Overhead: From O(nt 2 ) to O(nt), further reducing it to t/2. Designing block ciphers with reduced number of AND operations, for example: PICARO. Modifying private circuit for efficient FPGA implementation. 20/01/2015 Debapriya Basu Roy, Weekly Presentation 3/20

Motivation Private circuit is based on sound theoretical proof but with some inherent assumptions which may not be valid in practical scenario. 20/01/2015 Debapriya Basu Roy, Weekly Presentation 4/20

Motivation Private circuit is based on sound theoretical proof but with some inherent assumptions which may not be valid in practical scenario. Theoretical analysis of private circuit for power analysis in presence of glitches has been studied. 20/01/2015 Debapriya Basu Roy, Weekly Presentation 4/20

Motivation Private circuit is based on sound theoretical proof but with some inherent assumptions which may not be valid in practical scenario. Theoretical analysis of private circuit for power analysis in presence of glitches has been studied. However, no practical evaluation of private circuit is present in the literature 20/01/2015 Debapriya Basu Roy, Weekly Presentation 4/20

Contribution We identify the practical scenarios in which private circuit may fail to provide us the desired security. 20/01/2015 Debapriya Basu Roy, Weekly Presentation 5/20

Contribution We identify the practical scenarios in which private circuit may fail to provide us the desired security. We actually try to identify the lazy engineering practices which can rattle the security of private circuit. 20/01/2015 Debapriya Basu Roy, Weekly Presentation 5/20

Contribution We have implemented a lightweight block cipher SIMON using private circuit methodology on SASEBO-GII board. 20/01/2015 Debapriya Basu Roy, Weekly Presentation 6/20

Contribution We have implemented a lightweight block cipher SIMON using private circuit methodology on SASEBO-GII board. The implemented private circuits are analyzed against SCA using EM traces and correlation power analysis. 20/01/2015 Debapriya Basu Roy, Weekly Presentation 6/20

Contribution We have implemented a lightweight block cipher SIMON using private circuit methodology on SASEBO-GII board. The implemented private circuits are analyzed against SCA using EM traces and correlation power analysis. Moreover, we have used Test Vector Leakage Assessment (TVLA) methodology based leakage detection to classify our design as side channel secure or not. 20/01/2015 Debapriya Basu Roy, Weekly Presentation 6/20

Contribution Optimized SIMON: SIMON is implemented according to the ISW scheme, but the design tool is free to optimize the circuit. This is an example of lazy engineering approach, 20/01/2015 Debapriya Basu Roy, Weekly Presentation 7/20

Contribution Optimized SIMON: SIMON is implemented according to the ISW scheme, but the design tool is free to optimize the circuit. This is an example of lazy engineering approach, 2-input LUT based SIMON: Here, to mimic the private circuit methodology exactly on the FPGA, we have constrained the design tool to map each two-input gate to a single LUT. In other words, though a LUT has six inputs, it is modeled as two-input gate and gate-level optimization is minimized. 20/01/2015 Debapriya Basu Roy, Weekly Presentation 7/20

Contribution Optimized SIMON: SIMON is implemented according to the ISW scheme, but the design tool is free to optimize the circuit. This is an example of lazy engineering approach, 2-input LUT based SIMON: Here, to mimic the private circuit methodology exactly on the FPGA, we have constrained the design tool to map each two-input gate to a single LUT. In other words, though a LUT has six inputs, it is modeled as two-input gate and gate-level optimization is minimized. Synchronized 2-input LUT based SIMON: This is nearly similar to the previous methodology. The only difference is that each gate or LUT is preceded and followed by flip-flops so that each and every input to the gates is synchronized and glitches are minimized. 20/01/2015 Debapriya Basu Roy, Weekly Presentation 7/20

Preliminaries SIMON In 2013, NSA had introduced two ultra-lightweight block cipher SIMON and SPECK with a Feistel construction. Out of the two block ciphers, SIMON is more suited for hardware implementations. SIMON can encrypt a block of 2k bits, with a key of m k bits. 20/01/2015 Debapriya Basu Roy, Weekly Presentation 8/20

Preliminaries SIMON In 2013, NSA had introduced two ultra-lightweight block cipher SIMON and SPECK with a Feistel construction. Out of the two block ciphers, SIMON is more suited for hardware implementations. SIMON can encrypt a block of 2k bits, with a key of m k bits. TVLA TVLA consists in operating the device under test with a fixed and chosen key. Then, a T-test is applied on both sets of measurements. Similar difference testing can be performed on intermediate values of the block cipher and also on each bit of that intermediate value. 20/01/2015 Debapriya Basu Roy, Weekly Presentation 8/20

t-private Circuit Input Encoding: A vector of (a 1, a 2,..., a 2t, a 2t+1 ) 2t a 2t+1 = a a i. i=1 20/01/2015 Debapriya Basu Roy, Weekly Presentation 9/20

t-private Circuit Input Encoding: A vector of (a 1, a 2,..., a 2t, a 2t+1 ) 2t a 2t+1 = a a i. i=1 NOT gate: à = (a 1, a 2,..., a 2t+1 ). `ā = (a 1, a 2,..., a 2t+1 ). 20/01/2015 Debapriya Basu Roy, Weekly Presentation 9/20

t-private Circuit Input Encoding: A vector of (a 1, a 2,..., a 2t, a 2t+1 ) 2t a 2t+1 = a a i. i=1 NOT gate: à = (a 1, a 2,..., a 2t+1 ). `ā = (a 1, a 2,..., a 2t+1 ). Xor gate: c i = a i b i, 1 i 2t. 20/01/2015 Debapriya Basu Roy, Weekly Presentation 9/20

t-private Circuit AND gate: Inputs à = (a 1, a 2,..., a 2t+1 ) and `b = (b 1, b 2,..., b 2t+1 ), output `c = (c 1, c 2,..., c 2t+1 ), which is calculated by following steps: 1 Generate random bits r i,j, where i j and 1 i j 2t + 1. 2 Compute r j,i = (r i,j a i b j ) a j b i, where i j and 1 i j 2t + 1. 3 Compute c i = a i b i j i r i,j, where 1 i 2t and 1 j 2t. 20/01/2015 Debapriya Basu Roy, Weekly Presentation 10/20

AND Gate Example Inputs of the AND gate are two vectors à = (a 1, a 2, a 3 ) and `b = (b 1, b 2, b 3 ), Output `c = (c 1, c 2, c 3 ) is calculated as follows: c 1 = a 1 b 1 r 1,2 r 1,3 (1) c 2 = a 2 b 2 (r 1,2 a 1 b 2 ) a 2 b 1 r 2,3 (2) c 3 = a 3 b 3 (r 1,3 a 1 b 3 ) a 3 b 1 (r 2,3 a 2 b 3 ) a 3 b 2 (3) 20/01/2015 Debapriya Basu Roy, Weekly Presentation 11/20

CAD Optimization Figure: t = 1 private circuit for AND third coordinate on 4-input LUTs a3 b3 b2 4 input LUT a3b3 a3b1 a3b2 = a3(b) => Leakage b1 b3 a2 a1 r1,3 4 input LUT r2,3 4 input LUT a1b3 a2b3 r1,3 unconnected c3 20/01/2015 Debapriya Basu Roy, Weekly Presentation 12/20

CAD Optimization Figure: t = 1 private circuit for AND third coordinate on 4-input LUTs a3 b3 b2 4 input LUT a3b3 a3b1 a3b2 = a3(b) => Leakage b1 b3 a2 a1 r1,3 4 input LUT r2,3 4 input LUT a1b3 a2b3 r1,3 unconnected c3 { p(b = 0 x = 0) = 2/3, p(b = 1 x = 0) = 1/3, and { p(b = 0 x = 1) = 0, p(b = 1 x = 1) = 1. (4) 20/01/2015 Debapriya Basu Roy, Weekly Presentation 12/20

Delay in Random Variables There are two ways in which random variables can be provided to the private circuit: as external input or from a Random Number generator (RNG). Generally, random numbers are provided to the circuit from an RNG. 20/01/2015 Debapriya Basu Roy, Weekly Presentation 13/20

Delay in Random Variables There are two ways in which random variables can be provided to the private circuit: as external input or from a Random Number generator (RNG). Generally, random numbers are provided to the circuit from an RNG. c 3 = a 3 b 3 (r 1,3 a 1 b 3 ) a 3 b 1 (r 2,3 a 2 b 3 ) a 3 b 2 (5) Delay in the arrival of random bits r 1,3, r 2,3, a 1 and a 2 lead to information leakage. 20/01/2015 Debapriya Basu Roy, Weekly Presentation 13/20

Experimental Setup A parallel implementation SIMON32/64 crypto-core, running at clock frequency of 24-MHz, along with a simple UART interface is used to test our design on the Xilinx Virtex XC5-VLX30 FPGA of the SASEBO-GII platform. 20/01/2015 Debapriya Basu Roy, Weekly Presentation 14/20

Experimental Setup A parallel implementation SIMON32/64 crypto-core, running at clock frequency of 24-MHz, along with a simple UART interface is used to test our design on the Xilinx Virtex XC5-VLX30 FPGA of the SASEBO-GII platform. For t = 1, total number of random bits required by SIMON is 272, whereas for t = 2 and t = 3, number of required random bits become 608 and 1008. Random numbers are generated by a maximal length LFSR. 20/01/2015 Debapriya Basu Roy, Weekly Presentation 14/20

Result: Optimized Simon (a) TVLA Plot 20 15 TVLA Value of Optimized Simon Safe Value of TVLA 10 TVLA Value 5 0 5 10 15 20 0 100 200 300 400 500 600 700 800 900 1000 Sample Points (b) Average Key Ranking (c) Correlation Value Average Key Ranking 7 6 5 4 3 2 Correlation Value 0.015 0.01 0.005 0 0.005 0.01 Wrong Key Guess Correct Key Guess 1 0 200 400 600 800 1000 Number of Traces/1000 0.015 0 10 20 30 40 50 60 Sample Points 20/01/2015 Debapriya Basu Roy, Weekly Presentation 15/20

Result: 2 input LUT based Simon (a) TVLA Plot 15 10 TVLA Value of 2 i/p LUT Simon Safe Value of TVLA TVLA Value 5 0 5 10 15 0 200 400 600 800 1000 Sample Points (b) Average Key Ranking (c) Correlation Value Average Key Ranking 8 7 6 5 4 3 2 Correlation Value 0.01 0.008 0.006 0.004 0.002 0 0.002 0.004 0.006 0.008 Wrong Key Guess Correct Key Guess 1 0 200 400 600 800 1000 Number of Traces/1000 0.01 0 5 10 15 20 25 30 35 40 45 50 Sample Points 20/01/2015 Debapriya Basu Roy, Weekly Presentation 16/20

0.015 0.01 0.005 0 0.005 0.01 0.015 0.01 0.005 0 0.005 0.01 Synchronized 2-input LUT based SIMON (a) TVLA Plot (b) Avg. Key Rank P) (c) Correlation Value P TVLA Value 15 10 5 0 5 10 First Round Output TVLA Value of Synchronized Simon Safe Value of TVLA 15 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Sample Points Average Key Ranking 6 5 4 3 2 0 200 400 600 800 1000 Number of Traces/1000 Correlation Value Wrong Key Guess Correct Key Guess 0.015 0 5 10 15 20 25 30 35 40 45 50 Sample Points (d) Avg. Key Rank 1 (e) Correlation Value 1 Average Key Ranking 7 6 5 4 3 2 Correlation Value Wrong Key Guess Correct Key Guess 1 0 200 400 600 800 1000 Number of Traces/1000 0.015 0 10 20 30 40 50 60 Sample Points Figure: Side Channel Analysis of Synchronized 2 input LUT SIMON 20/01/2015 Debapriya Basu Roy, Weekly Presentation 17/20

Attack Summary Table: Summary of Side Channel Analysis Design TVLA Avg. Key Remarks Name Test Ranking Optimized Fails, significant Key ranking is low, Not SIMON information leakage successful attack secure 2 input LUT Fails, but less Key ranking is high, Secure against based SIMON information leakage attack is not CPA, could be compared to optimized successful broken by SIMON better model Synchronized Passes: no leakage Key ranking is high, Secure 2 input LUT at first round. Initial attack is not based Simon peaks are caused by successful plain-text loading 20/01/2015 Debapriya Basu Roy, Weekly Presentation 18/20

Resource Comparison Name LUTs Registers Slices Freq. Clock (MHz) Cycles Optimized 761 805 595 147 32 SIMON (1 ) (1 ) (1 ) (1 ) (1 ) 2 i/p LUT 1305 805 1241 88 32 based SIMON (1.71 ) (1 ) (2.08 ) (0.59 ) (1 ) Synchronized 2 i/p LUT 1309 2920 4090 104 288 based SIMON (1.71 ) (3.62 ) (6.87 ) (0.70 ) (9 ) 20/01/2015 Debapriya Basu Roy, Weekly Presentation 19/20

Conclusion We analyzed private circuits at an implementation level on a SIMON crypto-processor. Our results show that it is very easy for a CAD tool to override the basic requirements of private circuits. 20/01/2015 Debapriya Basu Roy, Weekly Presentation 20/20

Conclusion We analyzed private circuits at an implementation level on a SIMON crypto-processor. Our results show that it is very easy for a CAD tool to override the basic requirements of private circuits. Practical evaluations indicate that with proper constraints the leakage can be reduced. Moreover, by synchronizing each gate, we remove glitches and delay and approach much closer to theoretical evaluation of private circuits, but at a huge overhead. 20/01/2015 Debapriya Basu Roy, Weekly Presentation 20/20