GALS System Design. Side Channel Attack Secure Cryptographic Accelerators. Frank K. Gürkaynak. Integrated Systems Laboratory ETH Zurich

Similar documents
Scan. This is a sample of the first 15 pages of the Scan chapter.

Fully Pipelined High Speed SB and MC of AES Based on FPGA

ECEN454 Digital Integrated Circuit Design. Sequential Circuits. Sequencing. Output depends on current inputs

Lecture 10: Sequential Circuits

Chapter 7 Sequential Circuits

Testing of Cryptographic Hardware

Lecture 23 Design for Testability (DFT): Full-Scan

System IC Design: Timing Issues and DFT. Hung-Chih Chiang

From Theory to Practice: Private Circuit and Its Ambush

Understanding Cryptography A Textbook for Students and Practitioners by Christof Paar and Jan Pelzl. Chapter 2 Stream Ciphers ver.

EE 447/547 VLSI Design. Lecture 9: Sequential Circuits. VLSI Design EE 447/547 Sequential circuits 1

Understanding Cryptography A Textbook for Students and Practitioners by Christof Paar and Jan Pelzl. Chapter 2 Stream Ciphers ver.

Design for Test. Design for test (DFT) refers to those design techniques that make test generation and test application cost-effective.

Lecture 11: Sequential Circuit Design

A video signal processor for motioncompensated field-rate upconversion in consumer television

Lecture 23 Design for Testability (DFT): Full-Scan (chapter14)

Based on slides/material by. Topic Testing. Logic Verification. Testing

Glitches/hazards and how to avoid them. What to do when the state machine doesn t fit!

Sequential Circuit Design: Part 1

Department of Information Technology and Electrical Engineering. VLSI III: Test and Fabrication of VLSI Circuits L.

K.T. Tim Cheng 07_dft, v Testability

Overview: Logic BIST

CPE/EE 427, CPE 527 VLSI Design I Sequential Circuits. Sequencing

Sequential Circuit Design: Part 1

2.6 Reset Design Strategy

CHAPTER 6 ASYNCHRONOUS QUASI DELAY INSENSITIVE TEMPLATES (QDI) BASED VITERBI DECODER

Design for Testability

Stream Ciphers. Debdeep Mukhopadhyay

Testability: Lecture 23 Design for Testability (DFT) Slide 1 of 43

Outline. EECS150 - Digital Design Lecture 27 - Asynchronous Sequential Circuits. Cross-coupled NOR gates. Asynchronous State Transition Diagram

Performance Driven Reliable Link Design for Network on Chips

Digital Logic & Computer Design CS Professor Dan Moldovan Spring Chapter 3 :: Sequential Logic Design

System IC Design: Timing Issues and DFT. Hung-Chih Chiang

Introduction to Digital Logic Missouri S&T University CPE 2210 Flip-Flops

Jin-Fu Li Advanced Reliable Systems (ARES) Laboratory. National Central University

Using on-chip Test Pattern Compression for Full Scan SoC Designs

Asynchronous IC Interconnect Network Design and Implementation Using a Standard ASIC Flow

Synchronization in Asynchronously Communicating Digital Systems

Report on 4-bit Counter design Report- 1, 2. Report on D- Flipflop. Course project for ECE533

Encrypt Flip-Flop: A Novel Logic Encryption Technique For Sequential Circuits

Measurements of metastability in MUTEX on an FPGA

Momentary Changes in Outputs. State Machine Signaling. Oscillatory Behavior. Hazards/Glitches. Types of Hazards. Static Hazards

LOW POWER AND HIGH PERFORMANCE SHIFT REGISTERS USING PULSED LATCH TECHNIQUE

VLSI System Testing. BIST Motivation

A High-Resolution Flash Time-to-Digital Converter Taking Into Account Process Variability. Nikolaos Minas David Kinniment Keith Heron Gordon Russell

Design of Fault Coverage Test Pattern Generator Using LFSR

Sequential Circuit Design: Principle

SoC IC Basics. COE838: Systems on Chip Design

UNIT IV CMOS TESTING. EC2354_Unit IV 1

EE241 - Spring 2001 Advanced Digital Integrated Circuits. References

VLSI Design: 3) Explain the various MOSFET Capacitances & their significance. 4) Draw a CMOS Inverter. Explain its transfer characteristics

At-speed Testing of SOC ICs

Metastability Analysis of Synchronizer

An Asynchronous Fully Digital DLL for DDR SDRAM Data Recovery

LFSR Counter Implementation in CMOS VLSI

for Digital IC's Design-for-Test and Embedded Core Systems Alfred L. Crouch Prentice Hall PTR Upper Saddle River, NJ

A Low-power Pipelined Implementation of 2D Discrete Wavelet Transform

Efficient Architecture for Flexible Prescaler Using Multimodulo Prescaler

11. Sequential Elements

Co-simulation Techniques for Mixed Signal Circuits

VLSI Technology used in Auto-Scan Delay Testing Design For Bench Mark Circuits

Synchronous Digital Logic Systems. Review of Digital Logic. Philosophy. Combinational Logic. A Full Adder. Combinational Logic

Laboratory 4. Figure 1: Serdes Transceiver

A Novel Low-overhead Delay Testing Technique for Arbitrary Two-Pattern Test Application

EECS150 - Digital Design Lecture 10 - Interfacing. Recap and Topics

Design and Simulation of a Digital CMOS Synchronous 4-bit Up-Counter with Set and Reset

Level and edge-sensitive behaviour

Electrical & Computer Engineering ECE 491. Introduction to VLSI. Report 1

Sequences and Cryptography

A Symmetric Differential Clock Generator for Bit-Serial Hardware

PICOSECOND TIMING USING FAST ANALOG SAMPLING

Chapter 5 Flip-Flops and Related Devices

A CHARGE RECYCLING THREE-PHASE DUAL-RAIL PRE-CHARGE LOGIC BASED FLIP-FLOP

Figure 1 shows a simple implementation of a clock switch, using an AND-OR type multiplexer logic.

Bit Swapping LFSR and its Application to Fault Detection and Diagnosis Using FPGA

EEC 116 Fall 2011 Lab #5: Pipelined 32b Adder

Logic Design ( Part 3) Sequential Logic- Finite State Machines (Chapter 3)

A Novel Asynchronous ADC Architecture

Chapter 8 Design for Testability

Low Power VLSI CMOS Design An Image Processing Chip for RGB to HSI Conversion

ASNT8140. ASNT8140-KMC DC-23Gbps PRBS Generator with the (x 7 + x + 1) Polynomial. vee. vcc qp. vcc. vcc qn. qxorp. qxorn. vee. vcc rstn_p.

data and is used in digital networks and storage devices. CRC s are easy to implement in binary

FPGA Development for Radar, Radio-Astronomy and Communications

VARIABLE FREQUENCY CLOCKING HARDWARE

ASNT8142-KMC Generator of DC-to-23Gbps PRBS with Selectable Polynomials

Single Channel LVDS Tx

Project 6: Latches and flip-flops

Introduction. NAND Gate Latch. Digital Logic Design 1 FLIP-FLOP. Digital Logic Design 1

EL302 DIGITAL INTEGRATED CIRCUITS LAB #3 CMOS EDGE TRIGGERED D FLIP-FLOP. Due İLKER KALYONCU, 10043

Design for Testability Part II

VLSI Based Minimized Composite S-Box and Inverse Mix Column for AES Encryption and Decryption

Instructions. Final Exam CPSC/ELEN 680 December 12, Name: UIN:

VirtualScan TM An Application Story

DESIGN OF RECONFIGURABLE IMAGE ENCRYPTION PROCESSOR USING 2-D CELLULAR AUTOMATA GENERATOR

Diagnosis of Resistive open Fault using Scan Based Techniques

Testing Digital Systems II

CHAPTER 6 DESIGN OF HIGH SPEED COUNTER USING PIPELINING

EITF35: Introduction to Structured VLSI Design

Logic Analysis Basics

VLSI Test Technology and Reliability (ET4076)

Transcription:

GLS System esign Side Channel ttack Secure Cryptographic ccelerators Frank K. Gürkaynak Integrated Systems Laboratory ETH Zurich 22 November 2005 Ph.. Thesis Presentation

Intro GLS Crypto cacia Results G C R KGF Contents Motivation Outline 1 Outline 2 Globally-synchronous Locally-Synchronous (GLS) esign 3 Cryptography 4 GLS implementation of the ES lgorithm 5 Results and Conclusions GLS System esign kgf, Integrated Systems Laboratory (IIS) 2 / 48

Intro GLS Crypto cacia Results G C R KGF Contents Motivation What is wrong with the way we design chips now? Modern System-on-Chip circuits... Contain millions of transistors Require clock rates exceeding 100s of MHz Include 100s of subblocks Use 10s of different clock domains GLS System esign kgf, Integrated Systems Laboratory (IIS) 3 / 48

Intro GLS Crypto cacia Results G C R KGF Contents Motivation What is wrong with the way we design chips now? Modern System-on-Chip circuits... Contain millions of transistors, trend increasing Require clock rates exceeding 100s of MHz, trend increasing Include 100s of subblocks, trend increasing Use 10s of different clock domains, trend increasing GLS System esign kgf, Integrated Systems Laboratory (IIS) 3 / 48

Intro GLS Crypto cacia Results G C R KGF Contents Motivation What is wrong with the way we design chips now? Modern System-on-Chip circuits... Contain millions of transistors, trend increasing Require clock rates exceeding 100s of MHz, trend increasing Include 100s of subblocks, trend increasing Use 10s of different clock domains, trend increasing... are not easy to design The clock signal must be distributed to an increasing number of elements with increased precision Many independently designed components must be combined to a large system. ll subsystems must be able to reliably exchange data GLS System esign kgf, Integrated Systems Laboratory (IIS) 3 / 48

Intro GLS Crypto cacia Results G C R KGF Overview Structure GLS@IIS dvantages Globally-synchronous Locally-Synchronous esign GLS is a methodology to enable the design of complex digital systems on chip. System is divided into smaller GLS modules Each module works synchronously Interconnected modules communicate asynchronously GLS System esign kgf, Integrated Systems Laboratory (IIS) 4 / 48

Intro GLS Crypto cacia Results G C R KGF Overview Structure GLS@IIS dvantages Globally-synchronous Locally-Synchronous esign GLS is a methodology to enable the design of complex digital systems on chip. System is divided into smaller GLS modules Each module works synchronously Interconnected modules communicate asynchronously Was first developed by. Chapiro in 1984 First chip implementation by J. Muttersbach in 1999 GLS System esign kgf, Integrated Systems Laboratory (IIS) 4 / 48

Intro GLS Crypto cacia Results G C R KGF Overview Structure GLS@IIS dvantages Globally-synchronous Locally-Synchronous esign GLS is a methodology to enable the design of complex digital systems on chip. System is divided into smaller GLS modules Each module works synchronously Interconnected modules communicate asynchronously Was first developed by. Chapiro in 1984 First chip implementation by J. Muttersbach in 1999 GLS implementations differ in: synchronization method between blocks specific asynchronous communication protocol used GLS System esign kgf, Integrated Systems Laboratory (IIS) 4 / 48

Intro GLS Crypto cacia Results G C R KGF Overview Structure GLS@IIS dvantages Basic GLS Structure ata Synchronous Block Synchronous Block B Clk Clk Synchronous system Two large functional blocks of a synchronous system GLS System esign kgf, Integrated Systems Laboratory (IIS) 5 / 48

Intro GLS Crypto cacia Results G C R KGF Overview Structure GLS@IIS dvantages Basic GLS Structure GLS Module GLS Module B Locally Synchronous Island ataout atain Locally Synchronous Island B Lclk Lclk Local Clock Generator Local Clock Generator Local clock generators GLS modules are formed by adding a local clock generator for each functional block GLS System esign kgf, Integrated Systems Laboratory (IIS) 5 / 48

Intro GLS Crypto cacia Results G C R KGF Overview Structure GLS@IIS dvantages Basic GLS Structure GLS Module GLS Module B Locally Synchronous Island Pen Ta ataout Port Req ck atain Port Pen Ta Locally Synchronous Island B Lclk Local Clock Generator Ri i Ri i Lclk Local Clock Generator GLS system Port controllers are added to regulate data transfers between GLS modules GLS System esign kgf, Integrated Systems Laboratory (IIS) 5 / 48

Intro GLS Crypto cacia Results G C R KGF Overview Structure GLS@IIS dvantages GLS Works GLS at IIS J. Muttersbach First implementation T. Villiger S. Oetiker F. K. Gürkaynak GLS System esign kgf, Integrated Systems Laboratory (IIS) 6 / 48

Intro GLS Crypto cacia Results G C R KGF Overview Structure GLS@IIS dvantages GLS Works GLS at IIS J. Muttersbach T. Villiger Multi-point interconnect S. Oetiker F. K. Gürkaynak GLS System esign kgf, Integrated Systems Laboratory (IIS) 6 / 48

Intro GLS Crypto cacia Results G C R KGF Overview Structure GLS@IIS dvantages GLS Works GLS at IIS J. Muttersbach T. Villiger S. Oetiker Local clock generators F. K. Gürkaynak GLS System esign kgf, Integrated Systems Laboratory (IIS) 6 / 48

Intro GLS Crypto cacia Results G C R KGF Overview Structure GLS@IIS dvantages GLS Works GLS at IIS J. Muttersbach T. Villiger S. Oetiker F. K. Gürkaynak esign and test flow GLS System esign kgf, Integrated Systems Laboratory (IIS) 6 / 48

Intro GLS Crypto cacia Results G C R KGF Overview Structure GLS@IIS dvantages Why GLS? dvantages No global clock distribution problems Modular design flow Potential for low-power design Offers new possibilities for designers GLS System esign kgf, Integrated Systems Laboratory (IIS) 7 / 48

Intro GLS Crypto cacia Results G C R KGF Basics Security ES Side-Channel ttacks P Cryptography 101 LICE OSCR BOB Plain-text Encryption Cipher-text ecryption Plain-text Cipher-key Cipher-key Private key ciphers lice encrypts plain-text information by using a cipher-key. Bob can decrypt the resulting cipher-text only if he has access to the same cipher-key. GLS System esign kgf, Integrated Systems Laboratory (IIS) 8 / 48

Intro GLS Crypto cacia Results G C R KGF Basics Security ES Side-Channel ttacks P Cryptography 102 LICE OSCR BOB Plain-text Encryption Cipher-text ecryption Plain-text Cipher-key Cipher-key Security Oscar wishes to obtain the plain-text Oscar knows everything about the cryptographic algorithm Oscar can observe/modify the cipher-text but.. GLS System esign kgf, Integrated Systems Laboratory (IIS) 9 / 48

Intro GLS Crypto cacia Results G C R KGF Basics Security ES Side-Channel ttacks P Cryptography 102 LICE OSCR BOB Plain-text Encryption Cipher-text ecryption Plain-text Cipher-key Cipher-key Security Oscar wishes to obtain the plain-text Oscar knows everything about the cryptographic algorithm Oscar can observe/modify the cipher-text but.. Oscar does not know the cipher-key GLS System esign kgf, Integrated Systems Laboratory (IIS) 9 / 48

Intro GLS Crypto cacia Results G C R KGF Basics Security ES Side-Channel ttacks P dvanced Encryption Standard (ES) Plaintext RoundKey ES Standard by NIST 2001 128 bit data 128 bit key 10/12/14 rounds Components ShiftRows State Register S00 S10 S20 S30 S01 S11 S21 S31 S02 S12 S22 S32 S03 S13 S23 S33 ShiftRows ddroundkey SubBytes MixColumns X X X X X X X X X X X X X X X X Ciphertext GLS System esign kgf, Integrated Systems Laboratory (IIS) 10 / 48

Intro GLS Crypto cacia Results G C R KGF Basics Security ES Side-Channel ttacks P dvanced Encryption Standard (ES) Plaintext RoundKey ES Standard by NIST 2001 128 bit data 128 bit key 10/12/14 rounds Components ShiftRows ddroundkey State Register S00 S10 S20 S30 S01 S11 S21 S31 S02 S12 S22 S32 S03 S13 S23 S33 ShiftRows ddroundkey SubBytes MixColumns X X X X X X X X X X X X X X X X Ciphertext GLS System esign kgf, Integrated Systems Laboratory (IIS) 10 / 48

Intro GLS Crypto cacia Results G C R KGF Basics Security ES Side-Channel ttacks P dvanced Encryption Standard (ES) Plaintext RoundKey ES Standard by NIST 2001 128 bit data 128 bit key 10/12/14 rounds Components ShiftRows ddroundkey SubBytes State Register S00 S10 S20 S30 S01 S11 S21 S31 S02 S12 S22 S32 S03 S13 S23 S33 ShiftRows ddroundkey SubBytes MixColumns X X X X X X X X X X X X X X X X Ciphertext GLS System esign kgf, Integrated Systems Laboratory (IIS) 10 / 48

Intro GLS Crypto cacia Results G C R KGF Basics Security ES Side-Channel ttacks P dvanced Encryption Standard (ES) Plaintext RoundKey ES Standard by NIST 2001 128 bit data 128 bit key 10/12/14 rounds Components ShiftRows ddroundkey SubBytes MixColumns State Register S00 S10 S20 S30 S01 S11 S21 S31 S02 S12 S22 S32 S03 S13 S23 S33 ShiftRows ddroundkey SubBytes MixColumns X X X X X X X X X X X X X X X X Ciphertext GLS System esign kgf, Integrated Systems Laboratory (IIS) 10 / 48

Intro GLS Crypto cacia Results G C R KGF Basics Security ES Side-Channel ttacks P Side-Channels Once an otherwise secure algorithm is implemented in either Hardware or Software it gains physical properties that can be observed: Time required to finish the operation Power consumption Electromagnetic Radiation Heat dissipation Sound These properties are called Side Channels Side-Channel ttacks In 1996, P. Kocher showed that it is possible to obtain additional information on the cipher-key by observing these side-channels. GLS System esign kgf, Integrated Systems Laboratory (IIS) 11 / 48

Intro GLS Crypto cacia Results G C R KGF Basics Security ES Side-Channel ttacks P ifferential Power nalysis (P) 1 Select a subkey and a target operation 2 Use a simple model to predict the power consumption for S input vectors Cryptographic Hardware Under ttack Measurement S input vectors Simple Power Model of Cryptographic Hardware P 0 P 1 P 2 S vectors H 0,0 H 1,0 H 0,1 H 1,1 H 0,2 H 1,2 H K,0 H K,1 H K,2 P S H 0,S H 1,S H K,S Statistical Evaluation of Model and Measurement GLS System esign kgf, Integrated Systems Laboratory (IIS) 12 / 48

Intro GLS Crypto cacia Results G C R KGF Basics Security ES Side-Channel ttacks P ifferential Power nalysis (P) 1 Select a subkey and a target operation 2 Use a simple model to predict the power consumption for S input vectors 3 predict the power consumption for all K subkey permutations Cryptographic Hardware Under ttack Measurement P 0 P 1 P 2 S input vectors S vectors Simple Power Model of Cryptographic Hardware K subkeys H 0,0 H 0,1 H 0,2 H 1,0 H 1,1 H 1,2 H K,0 H K,1 H K,2 P S H 0,S H 1,S H K,S Statistical Evaluation of Model and Measurement GLS System esign kgf, Integrated Systems Laboratory (IIS) 12 / 48

Intro GLS Crypto cacia Results G C R KGF Basics Security ES Side-Channel ttacks P ifferential Power nalysis (P) 1 Select a subkey and a target operation 2 Use a simple model to predict the power consumption for S input vectors 3 predict the power consumption for all K subkey permutations 4 Measure the power consumption using the same S input vectors Cryptographic Hardware Under ttack Side Channel Information Measurement S vectors P 0 P 1 P 2 P S S input vectors Simple Power Model of Cryptographic Hardware H 0,0 H 0,1 H 0,2 H 0,S H 1,0 H 1,1 H 1,2 H 1,S H K,0 H K,1 H K,2 H K,S Statistical Evaluation of Model and Measurement GLS System esign kgf, Integrated Systems Laboratory (IIS) 12 / 48

Intro GLS Crypto cacia Results G C R KGF Basics Security ES Side-Channel ttacks P ifferential Power nalysis (P) 1 Select a subkey and a target operation 2 Use a simple model to predict the power consumption for S input vectors 3 predict the power consumption for all K subkey permutations 4 Measure the power consumption using the same S input vectors 5 etermine if one of the power hypotheses shows a distinctively higher correlation to the measurement. Cryptographic Hardware Under ttack Measurement S vectors S input vectors S vectors Statistical Evaluation of Model and Measurement Simple Power Model of Cryptographic Hardware K subkeys GLS System esign kgf, Integrated Systems Laboratory (IIS) 12 / 48 P 0 P 1 P 2 P S? H 0,0 H 0,1 H 0,2 H 0,S H 1,0 H 1,1 H 1,2 H 1,S H K,0 H K,1 H K,2 H K,S

Intro GLS Crypto cacia Results G C R KGF Block Basic ummy Re-order Parallel GLS Variable Clock CCI Clock Handshake ata 16 Block iagram The GLS implementation is called cacia. Operations are divided between a 128-bit Goliath and a 32-bit avid unit GOLITH Key Generator Round Key Memory INTERFCE 128 128-bit Reg ddroundkey ShiftRows 32 VI 32-bit Reg SubBytes MixColumns GLS System esign kgf, Integrated Systems Laboratory (IIS) 13 / 48

Intro GLS Crypto cacia Results G C R KGF Block Basic ummy Re-order Parallel GLS Variable Clock CCI Clock Handshake ata 16 Block iagram The GLS implementation is called cacia. Operations are divided between a 128-bit Goliath and a 32-bit avid unit avid and Goliath are separate GLS modules GOLITH_GLS GOLITH Key Generator Round Key Memory INTERFCE 128 g2s 128-bit Reg ddroundkey ShiftRows Local Clock Generator g2d 32 VI_GLS d2g VI 32-bit Reg SubBytes MixColumns Local Clock Generator GLS System esign kgf, Integrated Systems Laboratory (IIS) 13 / 48

Intro GLS Crypto cacia Results G C R KGF Block Basic ummy Re-order Parallel GLS Variable Clock Block iagram The GLS implementation is called cacia. Operations are divided between a 128-bit Goliath and a 32-bit avid unit avid and Goliath are separate GLS modules There is a second avid unit running in parallel. One round of ES requires 1 Goliath and 4 avid operations. CCI Clock Handshake ata 16 GOLITH_GLS Local Clock Generator GOLITH g2d d2g Key Generator Round Key Memory 32 VI_GLS 32-bit Reg SubBytes MixColumns VI Local Clock Generator g2s VI 128 VI_GLS INTERFCE 128-bit Reg ddroundkey ShiftRows 32-bit Reg SubBytes 32 MixColumns g2d d2g Local Clock Generator GLS System esign kgf, Integrated Systems Laboratory (IIS) 13 / 48

Intro GLS Crypto cacia Results G C R KGF Block Basic ummy Re-order Parallel GLS Variable Clock Implemented Countermeasures ddkey / ShiftR S01 S02 S03 S04 MixC 1 S05 S06 S07 S08 MixC 2 S09 S10 S11 S12 MixC 3 S13 S14 S15 S16 MixC 4 Normal Operation The attacker will normally target a single operation, and will measure the power consumption of this particular clock cycle. GLS System esign kgf, Integrated Systems Laboratory (IIS) 14 / 48

Intro GLS Crypto cacia Results G C R KGF Block Basic ummy Re-order Parallel GLS Variable Clock Implemented countermeasures ummy Oper. ummy Oper. ddkey / ShiftR S01 S02 S03 S04 S05 S06 S07 S08 S09 S10 S11 S12 S13 S14 S15 S16 ummy MixC 1 ummy ummy MixC 2 MixC 3 MixC 4 Inserting dummy operations Inserting random dummy cycles will confuse the attacker, since the targeted operation will not always be executed at a specific clock cycle. Unfortunately, this also increases the run-time. GLS System esign kgf, Integrated Systems Laboratory (IIS) 15 / 48

Intro GLS Crypto cacia Results G C R KGF Block Basic ummy Re-order Parallel GLS Variable Clock Implemented Countermeasures ummy Oper. ummy Oper. ddkey / ShiftR S06 S08 S07 S05 S16 S14 S15 S13 S04 S02 S03 S01 S12 S10 S09 S11 ummy ummy MixC 2 MixC 4 ummy MixC 1 MixC 3 Change ordering of operations Independent operations can be re-ordered arbitrarily. Contrary to inserting dummy cycles, this does not increase the run-time. GLS System esign kgf, Integrated Systems Laboratory (IIS) 16 / 48

Intro GLS Crypto cacia Results G C R KGF Block Basic ummy Re-order Parallel GLS Variable Clock Implemented Countermeasures ummy Oper. ummy Oper. ddkey / ShiftR S06 S16 S08 S07 S05 S14 S15 S13 ummy ummy MixC 2 MixC 4 S04 S02 S03 S01 S12 S10 S09 S11 MixC 3 ummy MixC 1 Parallelization Executing operations in parallel creates more activity at the same time, this appears as noise for the attacker. GLS System esign kgf, Integrated Systems Laboratory (IIS) 17 / 48

Intro GLS Crypto cacia Results G C R KGF Block Basic ummy Re-order Parallel GLS Variable Clock Implemented Countermeasures ummy Oper. ummy Oper. ddkey / ShiftR S06 S08 S07 S05 S16 S14 S13 S15 ummy ummy MixC 2 MixC 4 S12 S09 S10 S11 S02 S03 S04 S01 MixC 3 ummy MixC 1 Parallelization Executing operations in parallel creates more activity at the same time, this appears as noise for the attacker. GLS System esign kgf, Integrated Systems Laboratory (IIS) 17 / 48

Intro GLS Crypto cacia Results G C R KGF Block Basic ummy Re-order Parallel GLS Variable Clock Implemented Countermeasures ummy ummy ummy ummy Oper. ummy Oper. ummy ummy ummy ummy ummy ummy ummy ummy ummy ummy ummy ummy ummy ummy ummy ummy ummy ummy ummy ummy ummy ddkey / ShiftR ummy Oper. ummy Oper. ummy Oper. ummy Oper. ummy Oper. ummy Oper. ummy Oper. ummy Oper. ummy Oper. ummy Oper. ummy Oper. S06 S08 S02 S03 S07 S05 S04 S01 MixC 4 MixC 3 MixC 2 MixC 1 S16 S14 S13 S12 S09 S15 S10 S11 Parallelization Executing operations in parallel creates more activity at the same time, this appears as noise for the attacker. GLS System esign kgf, Integrated Systems Laboratory (IIS) 17 / 48

Intro GLS Crypto cacia Results G C R KGF Block Basic ummy Re-order Parallel GLS Variable Clock Implemented Countermeasures Goliath ummy ummy ummy ummy Oper. ummy Oper. ummy ummy ummy ummy ummy ummy ummy ummy ummy ummy ummy ummy ummy ummy ummy ummy ummy ummy ddkey / ShiftR ummy Oper. ummy Oper. ummy Oper. ummy Oper. ummy Oper. ummy Oper. ummy Oper. ummy Oper. ummy Oper. ummy Oper. ummy Oper. S06 S08 S02 S03 S07 S05 S04 S01 MixC 4 MixC 3 MixC 2 MixC 1 ummy Oper. avid Perels avid Tschopp S16 S14 S13 S12 S09 S15 S10 S11 Introducing GLS modules GLS modules have their own local clock generator, their clocks are independent and can not be controlled by the attacker GLS System esign kgf, Integrated Systems Laboratory (IIS) 18 / 48

Intro GLS Crypto cacia Results G C R KGF Block Basic ummy Re-order Parallel GLS Variable Clock Implemented Countermeasures Goliath ummy ummy ummy ummy Oper. ummy Oper. ummy ummy ummy ummy ummy ummy ummy ummy ummy ummy ummy ummy ummy ummy ummy ummy ummy ummy ddkey / ShiftR ummy Oper. ummy Oper. ummy Oper. ummy Oper. ummy Oper. ummy Oper. ummy Oper. ummy Oper. ummy Oper. ummy Oper. ummy Oper. S06 S08 S02 S03 S07 S05 S04 S01 MixC 4 MixC 3 MixC 2 MixC 1 ummy Oper. avid Perels avid Tschopp ummy S16 S14 S13 S12 S09 S15 S10 S11 Variable clock periods Each GLS module can randomly change its own clock period. This adds even more uncertainity GLS System esign kgf, Integrated Systems Laboratory (IIS) 19 / 48

Intro GLS Crypto cacia Results G C R KGF Chip Conclusions Final Chip Photo cacia UMC 0.25 µm CMOS Total area 1.75 mm 2 avid 0.221 mm 2 Goliath 0.687 mm 2 Sync. 0.584 mm 2 Rate 177.7 Mb/s Energy 1.232 mj/mb This part of the chip occupied by two independent ES designs: Baby and Pampers Synchronous Interface & Reference esign Clockgen Clockgen avid g2s Goliath d2g g2d d2g g2d Clockgen avid GLS System esign kgf, Integrated Systems Laboratory (IIS) 20 / 48

Intro GLS Crypto cacia Results G C R KGF Chip Conclusions Final Conclusions Conclusions novel GLS based crypto SIC implementing the ES algorithm was presented. In addition to traditional P countermeasures, the chip also includes GLS modules that use randomly varying clocks which make known attacks extremely difficult The GLS design methodology was refined. The presented design was designed using mainly standard E tools. combination of functional and scan-chain based testing allows a stuck-at-coverage of more than 99.8%. GLS System esign kgf, Integrated Systems Laboratory (IIS) 21 / 48

Intro GLS Crypto cacia Results G C R KGF Chip Conclusions Final Conclusions Conclusions novel GLS based crypto SIC implementing the ES algorithm was presented. In addition to traditional P countermeasures, the chip also includes GLS modules that use randomly varying clocks which make known attacks extremely difficult The GLS design methodology was refined. The presented design was designed using mainly standard E tools. combination of functional and scan-chain based testing allows a stuck-at-coverage of more than 99.8%. Is this really secure? We don t know yet. The security has to be evaluated by cryptanalysts. GLS System esign kgf, Integrated Systems Laboratory (IIS) 21 / 48

Intro GLS Crypto cacia Results G C R KGF Chip Conclusions Final QUESTIONS? cknowledgements for GLS Stephan Oetiker, Thomas Villiger, Hubert Kaeslin, Norbert Felber cknowledgements for Crypto-chips Stefan chleitner, Gérard Basler, ndres Erni, ominique Gasser, Peter Haldi, Franco Hug, drian Lutz, Norbert Pramstaller, Stefan Reichmuth, Pieter Rommens, Jürg Treichler, Stefan Zwicky and ndreas Burg, Matthias Braendli, Stefan Eberli, Simon Haene Stefan Mangard, Elisabeth Oswald, S. Berna Örs GLS System esign kgf, Integrated Systems Laboratory (IIS) 22 / 48

Intro GLS Crypto cacia Results G C R KGF Timing Clock Mutex Flow Timing iagram Locally Synchronous Island Clock Pen Ta Receiving GLS Module Req ck Local Clock Generator Ri i 1 GLS port is activated by the Pen signal, which enables Req GLS System esign kgf, Integrated Systems Laboratory (IIS) 23 / 48

Intro GLS Crypto cacia Results G C R KGF Timing Clock Mutex Flow Timing iagram Locally Synchronous Island Clock Pen Ta Receiving GLS Module Req ck Local Clock Generator Ri i 2 The receiving GLS module sets ck, clock pause request Ri is set GLS System esign kgf, Integrated Systems Laboratory (IIS) 23 / 48

Intro GLS Crypto cacia Results G C R KGF Timing Clock Mutex Flow Timing iagram Locally Synchronous Island Clock Pen Ta Receiving GLS Module Req ck Local Clock Generator Ri i 3 The clock pause is acknowledged by i, the LS Island is paused GLS System esign kgf, Integrated Systems Laboratory (IIS) 23 / 48

Intro GLS Crypto cacia Results G C R KGF Timing Clock Mutex Flow Timing iagram Locally Synchronous Island Clock Pen Ta Local clock is paused Receiving GLS Module Req ck Local Clock Generator Ri i 4 ata transfer is complete, Ta is set and Req is reset GLS System esign kgf, Integrated Systems Laboratory (IIS) 23 / 48

Intro GLS Crypto cacia Results G C R KGF Timing Clock Mutex Flow Timing iagram Locally Synchronous Island Clock Pen Ta Local clock is paused Receiving GLS Module Req ck Local Clock Generator Ri i 5 ll handshake signals return to their initial values, local clock is released GLS System esign kgf, Integrated Systems Laboratory (IIS) 23 / 48

Intro GLS Crypto cacia Results G C R KGF Timing Clock Mutex Flow Timing iagram Locally Synchronous Island Clock Pen Ta Local clock is paused Receiving GLS Module Req ck Local Clock Generator Ri i 6 Normal operation resumes, Ta remains active until the Pen is reset GLS System esign kgf, Integrated Systems Laboratory (IIS) 23 / 48

Intro GLS Crypto cacia Results G C R KGF Timing Clock Mutex Flow Local Clock Generator Ri i Ri i LClk elcntrlxsi ExtClk MUTEX MUTEX elay Register Programmable elay Line Muller-C elcntrlxsi(0) elcntrlxsi(1) elcntrlxsi(2) elcntrlxsi(15) ClkToelayxCI MUX22 N22 1 O212 N22 O212 N22 O212 0 N22 ClkelayedxCO 1 Fine elay elay Slice elay Slice elay Slice elay Slice GLS System esign kgf, Integrated Systems Laboratory (IIS) 24 / 48

Intro GLS Crypto cacia Results G C R KGF Timing Clock Mutex Flow Mutual exclusion element Vdd 3.9µ 0.24µ 3.9µ 0.24µ R1 2.56µ 0.24µ 2.56µ 0.24µ 3.9µ 0.24µ 1.28µ 0.24µ G2 Vdd 3.9µ 0.24µ 3.9µ 0.24µ R2 2.56µ 0.24µ 2.56µ 0.24µ 3.9µ 0.24µ 1.28µ 0.24µ G1 GLS System esign kgf, Integrated Systems Laboratory (IIS) 25 / 48

Intro GLS Crypto cacia Results G C R KGF Timing Clock Mutex Flow esign flow for GLS (as used in Shir-Khan) Self-timed library.stg 3 (UCS).eqn eqn2gate (ETHZ).v Signal Trans. Logic Graph Equations Local clock generator Gate Level Netlist.vhdl esign nalyzer (Synopsys).v Silicon Ensemble (Cadence).lef Pearl Timing n. (Cadence).tlf VHL Gate Level Source Code Netlist SIM micro-controller MacroCell Layout Timing Model.vhdl esign nalyzer (Synopsys).v Silicon Ensemble (Cadence).lef Pearl Timing n. (Cadence).tlf VHL Source Code GLS modules Gate Level Netlist MacroCell Layout Timing Model.gals module assembler (ETHZ).vhdl esign nalyzer (Synopsys).v Silicon Ensemble (Cadence).lef Pearl Timing n. (Cadence).tlf Configuration File.scr VHL Source Code Gate Level Netlist MacroCell Layout Timing Model Control Scripts GLS system.vhdl esign nalyzer (Synopsys).v Silicon.lef Pearl Calibre Ensemble Timing n. RC/LVS (Cadence) (Cadence) (Mentor).gds VHL Source Code Gate Level Netlist Layout Final Layout GLS System esign kgf, Integrated Systems Laboratory (IIS) 26 / 48

Intro GLS Crypto cacia Results G C R KGF Chips SubBytes Countermeasures P Setup ES implementations at IIS Riddler Fastcore res Baby / Pampers 2 x 128 bit parallel 128 bit 128 / 32 bit 16 bit 2.16 Gb/s (pipelined) 2.12 Gb/s 1.15 Gb/s (128 bit) 0.285 / 0.230 Gbit/s 37.8 mm 2 (0.6 µm) 3.56 mm 2 (0.25 µm) 1.2 mm 2 (0.25 µm) 0.35 / 0.58 mm 2 (0.25 µm En/ecryption (ECB) En/ecryption (all) Encryption (ECB/OFB) Encryption (ECB/OFB) Parallel atapath Independent Enc/ec Includes masking Plain / Countermeasures GLS System esign kgf, Integrated Systems Laboratory (IIS) 27 / 48

Intro GLS Crypto cacia Results G C R KGF Chips SubBytes Countermeasures P Setup SubBytes determines ES performance atapath width 8-bit 16-bit 32-bit 64-bit 128-bit Parallel SubBytes units 1 2 4 8 16 Complexity (gate eq) 5,052 6,281 7,155 11,628 20,410 rea (normalized) 1 1.266 1.472 2.432 4.269 Clock cycles for ES-128 160 80 40 20 10 Critical path (normalized) 1.349 1.341 1.206 1.133 1 Total time (normalized) 21.580 10.729 4.825 2.227 1 GLS System esign kgf, Integrated Systems Laboratory (IIS) 28 / 48

Intro GLS Crypto cacia Results G C R KGF Chips SubBytes Countermeasures P Setup Countermeasures against P attacks Protect your weak spots P measures power consumption dd Noise (unrelated switching activity) to confuse the measurements P targets a specific operation Change the operation order by inserting Random Operations Power consumption of CMOS is data dependent Use lternative Logic Styles There are direct operations between input and output Prevent direct operations by Masking the key with random data GLS System esign kgf, Integrated Systems Laboratory (IIS) 29 / 48

Intro GLS Crypto cacia Results G C R KGF Chips SubBytes Countermeasures P Setup P attacks work 0.25 0.2 0.15 Key 0x73 0.1 Correlation 0.05 0-0.05-0.1-0.15-0.2-0.25 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Number of measurements GLS System esign kgf, Integrated Systems Laboratory (IIS) 30 / 48

Intro GLS Crypto cacia Results G C R KGF Chips SubBytes Countermeasures P Setup P attack setup GLS System esign kgf, Integrated Systems Laboratory (IIS) 31 / 48

Intro GLS Crypto cacia Results G C R KGF rea Latency avid Goliath G2S G2 2G Test Coverag rea overhead of GLS avid Goliath rea µm 2 183,007 92.98% 551,194 96.66% rea µm 2 -LSFRs 26,928 13.68% 73,512 12.89% rea µm 2 -ClockGen 7,579 3.85% 7,626 1.34% rea µm 2 -Ports 6,225 3.16% 11,412 2.00% rea µm 2 -GLS 196,811 100.00% 570,233 100.00% rea µm 2 -TOTL 963,855 GLS System esign kgf, Integrated Systems Laboratory (IIS) 32 / 48

Intro GLS Crypto cacia Results G C R KGF rea Latency avid Goliath G2S G2 2G Test Coverag Latency overhead of GLS Synchronous GLS+P avid Goliath avid Goliath Critical path (ns) 5.43 5.84 3.98 5.27 Latency (cycles) 3 1 4 2 Clock freq. (MHz) 170.96 250.8 189.6 Enc(clock cycles) 7 8 2 Enc time (ns) 40.88 42.38 GLS System esign kgf, Integrated Systems Laboratory (IIS) 33 / 48

Intro GLS Crypto cacia Results G C R KGF rea Latency avid Goliath G2S G2 2G Test Coverag Block diagram of avid atainxi ataoutxo 32 PolicyxI ModexSI PenxSO TaxSI 9 3 32 ebugatainxi ebugenxsi ebugclkxci 32 32 32 89-bit LFSR Controller Reg-32 32 32 32 8 32 8 32 32 ffine Transform InvMixColumns 8 8 32 Mult. Inverse Mult. Inverse MixColumns 32 32 32 8 32 8 Inverse ffine T. Multiplicative Inverse Output Select ffine Transformation MixColumns 32 32 32 32 GLS System esign kgf, Integrated Systems Laboratory (IIS) 34 / 48

Intro GLS Crypto cacia Results G C R KGF rea Latency avid Goliath G2S G2 2G Test Coverag Block diagram of Goliath atainxi ataoutxo ebugatainxi ebugddrinxi ebugenxsi ebugclkxci 16 4 32 ModexSI PenxSO TaxSI 128 128 128 128 242-bit LFSR 128 Controller Reg-128 128 128 128 32 32 32 InvShiftRows 128 32 32 Key Expansion 32 32 128 64 x 32 SRM Reg-32 Reg-32 ddroundkey 128 128 128 Output Select ShiftRows 128 128 Reg-128 avid Communication ShiftRows ddroundkey Round Key Generator ToavidTxO FrmavidTxI 32 32 32 32 FrmavidPxI ToavidPxO GLS System esign kgf, Integrated Systems Laboratory (IIS) 35 / 48

Intro GLS Crypto cacia Results G C R KGF rea Latency avid Goliath G2S G2 2G Test Coverag Goliath to Synchronous Interface Req+ Pen+ O211 Ri+ Pen i Ta INV1 i+ ck+ Req- Ta+ Ta- Req i OR21 Pen Req i Req NN41 NN21 NN31 Ri NN21 ck- Ri- Pen- i i- Req i N21 ck GLS System esign kgf, Integrated Systems Laboratory (IIS) 36 / 48

Intro GLS Crypto cacia Results G C R KGF rea Latency avid Goliath G2S G2 2G Test Coverag Goliath to avid O211 Pen+ i Pen Ta Req+ Ri+ Pen- Ta- O211 INV1 i+ INV1 ck+ Ta+ i- Req N21 Pen Ri ck- Req- Ri- N21 i ck GLS System esign kgf, Integrated Systems Laboratory (IIS) 37 / 48

Intro GLS Crypto cacia Results G C R KGF rea Latency avid Goliath G2S G2 2G Test Coverag avid to Goliath Ta- O211 i- Pen- Pen+ Pen i Z INV1 N21 Ri- Req+ Pen Req BUFL ck- ck+ ck Ri INV1 Req- Ta+ Ri+ i i+ Z OR21 Pen N31 Ta GLS System esign kgf, Integrated Systems Laboratory (IIS) 38 / 48

Intro GLS Crypto cacia Results G C R KGF rea Latency avid Goliath G2S G2 2G Test Coverag Scan-test configuration GLS Module LS Island GLS Module LS Island Pen Ta FSM Req ck FSM Pen Ta Lclk Ri i Ri i Lclk Local Clock Local Clock Generator Local Clock Local Clock Generator ScanIn ScanEn ScanClk ScanOut GLS System esign kgf, Integrated Systems Laboratory (IIS) 39 / 48

Intro GLS Crypto cacia Results G C R KGF rea Latency avid Goliath G2S G2 2G Test Coverag Stuck-at-fault test coverage Stuck-at-fault testing There are a total of 154.604 stuck-at faults in the entire circuit Only 182 of these faults are within the asynchronous finite state machines straighforward test vector generation using TetraMax fails to detect 3.089 faults Using a simple encryption/decryption operation 2.796 of these faults were detected by simulation. The total test coverage obtained by combining these two methods exceeds 99.8%. GLS System esign kgf, Integrated Systems Laboratory (IIS) 40 / 48

Intro GLS Crypto cacia Results G C R KGF Modes Sim Performance Period Freq Power Power/Mode istribution of the first SubBytes operation Frequency Frequency Frequency Frequency 75 50 25 0 0 50 100 150 200 250 300 350 400 450 500 20 15 10 5 0 0 50 100 150 200 250 300 350 400 450 500 20 15 10 5 0 0 50 100 150 200 250 300 350 400 450 500 20 15 Mode 00 : s fast as possible Mode 01 : Slightly Random Mode 10 : Mostly Random Mode11: Pre-programmed Policy 10 5 0 0 50 100 150 200 250 300 350 400 450 500 Time [ns] GLS System esign kgf, Integrated Systems Laboratory (IIS) 41 / 48

Intro GLS Crypto cacia Results G C R KGF Modes Sim Performance Period Freq Power Power/Mode Simulation result Interface Clk Req ck Req Input for Next Operation ck Output of Last Operation Strb Goliath Request to start cknowledge Req ck Start 0 1 2 3 4 5 6 7 8 9 10 0 End Clk avid Tschopp more random less random fast less random more random Clock Pause Req ck Clk avid Perels Req ck Clk ~100MHz ~120MHz ~170MHz ~190MHz ~200MHz ~200MHz ~200MHz ~190MHz ~180MHz ~150MHz 7200 ns 7600 ns 8 us GLS System esign kgf, Integrated Systems Laboratory (IIS) 42 / 48

Intro GLS Crypto cacia Results G C R KGF Modes Sim Performance Period Freq Power Power/Mode Operation modes of cacia Operation I/O Clock Encr. Throughput Energy Mode [MHz] [ns] [Mb/s] [mj/mb] cacia - 00 50 720.0 177.7 1.232 cacia - 01 50 880.0 145.4 1.362 cacia - 10 50 2,440.0 57.1 2.704 cacia - 11 50 920.0 139.1 1.198 Synchronous 150 779.2 164.2 0.976 GLS System esign kgf, Integrated Systems Laboratory (IIS) 43 / 48

Intro GLS Crypto cacia Results G C R KGF Modes Sim Performance Period Freq Power Power/Mode Clock period versus delay-line settings cacia Local Clock Generator Period (Chip #1) 14 Goliath. Perels. Tschopp Simulation 12 Local Clock Period [ns] 10 8 6 4 0 5 10 15 20 25 30 35 Number of elay Slices GLS System esign kgf, Integrated Systems Laboratory (IIS) 44 / 48

Intro GLS Crypto cacia Results G C R KGF Modes Sim Performance Period Freq Power Power/Mode Clock frequency versus delay-line settings 200 180 Goliath Clock Generator Frequency (Chip #1-Chip #14) Chip #1 Chip #14 Simulation Local Clock Frequency [MHz] 160 140 120 100 80 60 0 5 10 15 20 25 30 35 Number of elay Slices GLS System esign kgf, Integrated Systems Laboratory (IIS) 45 / 48

Intro GLS Crypto cacia Results G C R KGF Modes Sim Performance Period Freq Power Power/Mode Power consumption vs maximum GLS module frequency 260 cacia Power issipation 240 Power Consumption [mw] 220 200 180 160 140 120 60 80 100 120 140 160 180 Clock Frequency of GLS Modules [MHz] GLS System esign kgf, Integrated Systems Laboratory (IIS) 46 / 48

Intro GLS Crypto cacia Results G C R KGF Modes Sim Performance Period Freq Power Power/Mode Power consumption of different operation modes 260 250 240 Mode00 Mode01 Mode10 Mode11 cacia Power issipation with ifferent Operation Modes 230 Power Consumption [mw] 220 210 200 190 180 170 160 150 0 2 4 6 8 10 12 14 16 18 20 I/O Clock Period [MHz] GLS System esign kgf, Integrated Systems Laboratory (IIS) 47 / 48

Intro GLS Crypto cacia Results G C R KGF F. Kağan Gürkaynak in KG GLS System esign kgf, Integrated Systems Laboratory (IIS) 48 / 48