Amon: Advanced Mesh-Like Optical NoC

Similar documents
ECEN689: Special Topics in High-Speed Links Circuits and Systems Spring 2011

EL302 DIGITAL INTEGRATED CIRCUITS LAB #3 CMOS EDGE TRIGGERED D FLIP-FLOP. Due İLKER KALYONCU, 10043

11. Sequential Elements

IEEE Santa Clara ComSoc/CAS Weekend Workshop Event-based analog sensing

Sharif University of Technology. SoC: Introduction

Implementation of an MPEG Codec on the Tilera TM 64 Processor

EE241 - Spring 2005 Advanced Digital Integrated Circuits

Optical clock distribution for a more efficient use of DRAMs

Cisco ONS Exposed Faceplate Mux/Demux 48-Channel Extended Bandwidth Patch Panel and Splitter Coupler Module

Optical shift register based on an optical flip-flop memory with a single active element Zhang, S.; Li, Z.; Liu, Y.; Khoe, G.D.; Dorren, H.J.S.

More on Flip-Flops Digital Design and Computer Architecture: ARM Edition 2015 Chapter 3 <98> 98

Clock Generation and Distribution for High-Performance Processors

Performance Driven Reliable Link Design for Network on Chips

High Speed Reconfigurable FPGA Architecture for Multi-Technology Applications

A Symmetric Differential Clock Generator for Bit-Serial Hardware

data and is used in digital networks and storage devices. CRC s are easy to implement in binary

EITF35: Introduction to Structured VLSI Design

EECS150 - Digital Design Lecture 18 - Circuit Timing (2) In General...

This paper is a preprint of a paper accepted by Electronics Letters and is subject to Institution of Engineering and Technology Copyright.

Low Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

High Performance Microprocessor Design and Automation: Overview, Challenges and Opportunities IBM Corporation

Efficient Architecture for Flexible Prescaler Using Multimodulo Prescaler

Design of a Low Power Four-Bit Binary Counter Using Enhancement Type Mosfet

An FPGA Implementation of Shift Register Using Pulsed Latches

Research Article. Implementation of Low Power, Delay and Area Efficient Shifters for Memory Based Computation

Gated Driver Tree Based Power Optimized Multi-Bit Flip-Flops

LFSR Test Pattern Crosstalk in Nanometer Technologies. Laboratory for Information Technology University of Hannover, Germany

LFSR Counter Implementation in CMOS VLSI

Wavelength selective electro-optic flip-flop

Clocking Spring /18/05

A Quasi-Static Optoelectronic ATM Switch

A Low Power Delay Buffer Using Gated Driver Tree

TERROR: RELIABLE AND EFFICIENT LINK DESIGN FOR NETWORK ON CHIPS

A Modified Static Contention Free Single Phase Clocked Flip-flop Design for Low Power Applications

Figure.1 Clock signal II. SYSTEM ANALYSIS

Design of Fault Coverage Test Pattern Generator Using LFSR

12-bit Wallace Tree Multiplier CMPEN 411 Final Report Matthew Poremba 5/1/2009

A Fast Constant Coefficient Multiplier for the XC6200

3/5/2017. A Register Stores a Set of Bits. ECE 120: Introduction to Computing. Add an Input to Control Changing a Register s Bits

Use of Low Power DET Address Pointer Circuit for FIFO Memory Design

PICOSECOND TIMING USING FAST ANALOG SAMPLING

PARALLEL PROCESSOR ARRAY FOR HIGH SPEED PATH PLANNING

High Performance Dynamic Hybrid Flip-Flop For Pipeline Stages with Methodical Implanted Logic

High Performance Carry Chains for FPGAs

Digital Transmission System Signaling Protocol EVLA Memorandum No. 33 Version 3

Dual Edge Adaptive Pulse Triggered Flip-Flop for a High Speed and Low Power Applications

SPATIAL LIGHT MODULATORS

Logic Design. Flip Flops, Registers and Counters

Modeling Digital Systems with Verilog

Coherent Receiver for L-band

Introduction to CMOS VLSI Design (E158) Lecture 11: Decoders and Delay Estimation

Integrated Circuit Design ELCT 701 (Winter 2017) Lecture 1: Introduction

All-optical Write/Read Memory for 20 Gb/s Data Packets

ESE534: Computer Organization. Previously. Today. Previously. Today. Preclass 1. Instruction Space Modeling

Frame Processing Time Deviations in Video Processors

Introduction to Fibre Optics

THE LXI IVI PROGRAMMING MODEL FOR SYNCHRONIZATION AND TRIGGERING

Reconfigurable Neural Net Chip with 32K Connections

ModBox-1310nm-1550nm-NRZ 1310nm & 1550 nm, 28 Gb/s, 44 Gb/s Reference Transmitters

SHF Communication Technologies AG,

User Manual Entry Line Industrial Fast Ethernet Switch 4x 10/100Base-TX, 1x 100Base-X Fiber Port 4x PoE+ up to 30W

A Power Efficient Flip Flop by using 90nm Technology

Efficient 500 MHz Digital Phase Locked Loop Implementation sin 180nm CMOS Technology

An Efficient High Speed Wallace Tree Multiplier

Lecture 2: Digi Logic & Bus

7100 Nano ROADM. Compact ROADM-on-a-Blade with Colorless/ Directionless Add/drop Options COMPACT, INTEGRATED ROADM-ON-A-BLADE DATASHEET

AN OPTIMIZED IMPLEMENTATION OF MULTI- BIT FLIP-FLOP USING VERILOG

Design Project: Designing a Viterbi Decoder (PART I)

EE178 Spring 2018 Lecture Module 5. Eric Crabill

DESIGN OF DOUBLE PULSE TRIGGERED FLIP-FLOP BASED ON SIGNAL FEED THROUGH SCHEME

Low Power VLSI CMOS Design An Image Processing Chip for RGB to HSI Conversion

Data flow architecture for high-speed optical processors

An MFA Binary Counter for Low Power Application

IC Layout Design of Decoders Using DSCH and Microwind Shaik Fazia Kausar MTech, Dr.K.V.Subba Reddy Institute of Technology.

EEC 116 Fall 2011 Lab #5: Pipelined 32b Adder

Low-Power and Area-Efficient Shift Register Using Pulsed Latches

An Efficient IC Layout Design of Decoders and Its Applications

DC Ultra. Concurrent Timing, Area, Power and Test Optimization. Overview

FPGA Development for Radar, Radio-Astronomy and Communications

ISSCC 2006 / SESSION 18 / CLOCK AND DATA RECOVERY / 18.6

LOW POWER AND HIGH PERFORMANCE SHIFT REGISTERS USING PULSED LATCH TECHNIQUE

Timing Error Detection: An Adaptive Scheme To Combat Variability EE241 Final Report Nathan Narevsky and Richard Ott {nnarevsky,

XCOM1002JE (8602JE) Optical Receiver Manual

Failure Analysis Technology for Advanced Devices

GFT Channel Digital Delay Generator

International Research Journal of Engineering and Technology (IRJET) e-issn: Volume: 03 Issue: 07 July p-issn:

Project 6: Latches and flip-flops

Large Area, High Speed Photo-detectors Readout

A 5-Gb/s Half-rate Clock Recovery Circuit in 0.25-μm CMOS Technology

Lossless Compression Algorithms for Direct- Write Lithography Systems

CS 61C: Great Ideas in Computer Architecture

SMPTE-259M/DVB-ASI Scrambler/Controller

EE178 Lecture Module 4. Eric Crabill SJSU / Xilinx Fall 2005

All-Optical Flip-Flop Based on Coupled SOA-PSW

Scan. This is a sample of the first 15 pages of the Scan chapter.

Design and Implementation of FPGA Configuration Logic Block Using Asynchronous Static NCL

VLSI Test Technology and Reliability (ET4076)

High Performance TFT LCD Driver ICs for Large-Size Displays

Reconfigurable Architectures. Greg Stitt ECE Department University of Florida

A low jitter clock and data recovery with a single edge sensing Bang-Bang PD

Transcription:

Amon: Advanced Mesh-Like Optical NoC Sebastian Werner, Javier Navaridas and Mikel Luján Advanced Processor Technologies Group School of Computer Science The University of Manchester

Bottleneck: On-chip Interconnects in Many-core Systems Metal Wires Increasing Signal Delay with technology scaling while gate delays decrease Increasing Power Consumption in global core-tocore interconnects due to repeaters, regenerators, or buffers 2

Bottleneck: On-chip Interconnects in Many-core Systems Metal Wires Increasing Signal Delay with technology scaling while gate delays decrease Increasing Power Consumption in global core-tocore interconnects due to repeaters, regenerators, or buffers -> Performance and Power demands cannot be met by metal wires in future many-core chips 1 1 O'Connor, Ian, and Gabriela Nicolescu. Integrated Optical Interconnect Architectures for Embedded Systems. Springer Science & Business Media, 2012. 2

Motivation for Optical Networks-on-chip 1.Optical data transmission by using light -> low latency (signal propagation 15ps/mm) (global metal wire: ~262ps/mm) 2.Data can be transmitted simultaneously on the same waveguide at different wavelengths -> high bandwidth without adding wires 3.(Almost) Distance independent energy consumption 3

Motivation for Optical Networks-on-chip 1.Optical data transmission by using light -> low latency (signal propagation 15ps/mm) (global metal wire: ~262ps/mm) 2.Data can be transmitted simultaneously on the same waveguide at different wavelengths -> high bandwidth without adding wires 3.(Almost) Distance independent energy consumption Huge Potential, BUT: Nanophotonic components may have high power demands -> Novel network architectures required to enable efficient, low-power operation 3

Optical on-chip Data Transmission Wavelength: λ Laser Source λ1 Coupler Waveguide 4

Optical on-chip Data Transmission Wavelength: λ Microring Resonators: Backend Circuitry Ring Modulator Sender A Laser Source λ1 λ1 Coupler Waveguide 4

Optical on-chip Data Transmission Wavelength: λ Microring Resonators: Backend Circuitry Ring Modulator Sender A Receiver A Laser Source λ1 λ1 λ1 Coupler Waveguide Photodetector Ring Filter with λ1 resonance 4

Optical on-chip Data Transmission Wavelength: λ Microring Resonators: Backend Circuitry Ring Modulator Sender A Receiver A Laser Source λ1 λ2 λ1 λ2 λ1 λ2 Coupler Waveguide Photodetector Ring Filter with λ1 resonance 4

Optical on-chip Data Transmission Wavelength: λ Microring Resonators: Backend Circuitry Ring Modulator Sender A Sender B Receiver A Receiver B Laser Source λ1 λ2 λ1 λ2 λ1 λ2 Coupler Waveguide Photodetector Ring Filter with λ1 resonance 4

Ring Filters for Switching (1) Ring Filter with resonance λ2 λ2 Waveguide 1 Waveguide 2 5

Ring Filters for Switching (1) Light λ1 Ring Filter with resonance λ2 λ2 Waveguide 1 Waveguide 2 5

Ring Filters for Switching (1) Light λ1 Ring Filter with resonance λ2 λ2 λ2 λ2 Waveguide 1 Waveguide 2 Drop port 5

Ring Filters for Switching (2) Number of λ = Number Ring Filters λ1 λ2 λn 6

Optical Switch for 2D Mesh 7

Optical Switch for 2D Mesh λ1 λ2 λ3 Detector responding to λ3 λ4 λ5 λ6 λ7 λ8 λ9 Detector responding to λ9 7

Optical Switch for 2D Mesh λ1 λ2 λ3 λ9 λ3 λ4 λ5 λ6 λ3 λ9 λ7 λ8 λ9 Detector responding to λ3 Detector responding to λ9 7

Optical Switch for 2D Mesh λ1 λ2 λ3 λ9 λ3 λ4 λ5 λ6 λ3 λ9 λ7 λ8 λ9 Detector responding to λ3 Detector responding to λ9 λ3 λ9 λ9 λ3 7

ONoC Design Properties Network design using microring resonators is based on deterministic routing Hardwired, pre-defined paths between each source-destination pair Switching equals routing algorithm -> ONoC design comprises Topology, Routing algorithm and Switch architecture 8

Contention in Optical NoCs λ1 λ2 λ3 λ4 λ5 λ6 λ7 λ8 λ9 9

Contention in Optical NoCs λ1 λ2 λ3 Detector responding to λ6 λ6 λ4 λ5 λ6 λ7 λ8 λ9 9

Contention in Optical NoCs λ1 λ2 λ3 Detector responding to λ6 λ6 λ4 λ5 λ6 λ7 λ8 λ9 λ6 Ejection λ6 λ6 9

Contention in Optical NoCs λ1 λ2 λ3 Detector responding to λ6 λ6 λ4 λ5 λ6 λ6 λ7 λ8 λ9 λ6 Ejection λ6 λ6 9

Contention in Optical NoCs λ1 λ2 λ3 Detector responding to λ6 λ6 λ4 λ5 λ6 λ6 λ7 λ8 λ9 λ6 Ejection λ6 λ6 Contention Only one Sender per Destination at a time! λ6 9

Contention in Optical NoCs λ1 λ2 λ3 Detector responding to λ6 λ6 λ4 λ5 λ6 λ6 λ7 λ8 λ9 λ6 Ejection λ6 λ6 Contention Only one Sender per Destination at a time! λ6 Underlying Control Network required for destination reservation -> Req / Ack message exchange 9

Objectives of low-power ONoC Design Low Laser Power Min. path loss -> short paths ->Low diameter Small #λ for addressing ->fewer laser sources 10

Objectives of low-power ONoC Design Low Laser Power Min. path loss -> short paths ->Low diameter Small #λ for addressing ->fewer laser sources Low Ring Heater Power Small #Microrings (20µW/Ring) Small #λ -> Fewer Ring Filters for Switching 10

State-of-the-art solutions are 1. Optical Spidergon 1 2. QuT 2 Aim low-power Microring resonators Ring-based topology 1 S. Koohi and S. Hessabi, Scalable architecture for a contention-free optical network on-chip, Journal of Parallel and Distributed Computing, vol. 72, no. 11, pp. 1493 1506, 2012. 2 P. K. Hamedani, N. E. Jerger, and S. Hessabi, Qut: A low-power optical network-on-chip, in NOCS, 2014. IEEE, 2014, pp. 80 87. 11

Optical Spidergon 3 4 5 6 7 2 8 1 9 16 10 15 14 13 12 11 12

12 1 2 3 4 5 16 15 6 7 14 13 12 11 9 8 10 Optical Spidergon 9 10 11 12 13 14 15 16

12 1 2 3 4 5 16 15 6 7 14 13 12 11 9 8 10 Optical Spidergon 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Optical Spidergon 3 4 5 6 7 2 1 8 9 N/2 λs in Network for addressing -> Reduces Laser Power 16 10 15 14 13 12 11 12

Optical Spidergon 3 4 5 6 7 2 1 8 9 N/2 λs in Network for addressing -> Reduces Laser Power 16 10 15 14 13 12 11 12

Optical Spidergon 3 4 5 6 7 2 1 8 9 N/2 λs in Network for addressing -> Reduces Laser Power 16 10 Different paths to prevent overwriting data! 15 14 13 12 11 12

Optical Spidergon 3 4 5 6 7 2 1 16 8 9 10 λ5,λ6,λ7,λ8 λ2,λ3,λ4 1 Switch Design (N/2-1) Ring Filters for Switching at each node 15 14 13 12 11 13

QuT 15 16 1 2 3 14 4 13 5 12 6 11 10 9 8 7 14

QuT 15 16 1 2 3 N/4 λs in Network for addressing 14 4 13 5 12 6 11 10 9 8 7 14

QuT 14 15 1 16 2 3 4 N/4 λs in Network for addressing 2 Switch Designs (Odd/ Even) 13 12 11 10 9 8 7 6 5 Even Switches cheap Odd Switches still as expensive as in Spidergon (Ring-based Topology have similar switching demands) 14

Spidergon/QuT + N/2 and N/4 number of wavelengths in network, providing different paths to avoid contention - Long paths in ring topologies - Large number of ring filters for switching required 15

Proposal: Mesh-based Topology 1 4 2 5 3 6 λ6,λ9 Advantages over ring-topologies in onocs: Shorter paths/diameter than ringbased networks In XY Routing: At most N-1 Ring Filters in each switch (every other node in column) 7 8 9 16

Proposal: Mesh-based Topology 1 4 2 5 3 6 λ6,λ9 Advantages over ring-topologies in onocs: Shorter paths/diameter than ringbased networks In XY Routing: At most N-1 Ring Filters in each switch (every other node in column) 7 8 9 Problem: - N number of λs in Mesh: -> Larger Laser Power than N/4 (QuT) 16

Proposal: Mesh-based Topology 1 4 2 5 3 6 λ6,λ9 Advantages over ring-topologies in onocs: Shorter paths/diameter than ringbased networks In XY Routing: At most N-1 Ring Filters in each switch (every other node in column) 7 8 9 Problem: - N number of λs in Mesh: -> Larger Laser Power than N/4 (QuT) Solution: Split Mesh in 4 parts 16

17 Amon 15 16 14 13 12 11 10 9 8 7 6 5 4 3 2 1 31 32 30 29 28 27 26 25 24 23 22 21 20 19 18 17 47 48 46 45 44 43 42 41 40 39 38 37 36 35 34 33 63 64 62 61 60 59 58 57 56 55 54 53 52 51 50 49

17 Amon 15 16 14 13 12 11 10 9 8 7 6 5 4 3 2 1 31 32 30 29 28 27 26 25 24 23 22 21 20 19 18 17 47 48 46 45 44 43 42 41 40 39 38 37 36 35 34 33 63 64 62 61 60 59 58 57 56 55 54 53 52 51 50 49

17 Amon 15 16 14 13 12 11 10 9 8 7 6 5 4 3 2 1 31 32 30 29 28 27 26 25 24 23 22 21 20 19 18 17 47 48 46 45 44 43 42 41 40 39 38 37 36 35 34 33 63 64 62 61 60 59 58 57 56 55 54 53 52 51 50 49

17 Amon 15 16 14 13 12 11 10 9 8 7 6 5 4 3 2 1 31 32 30 29 28 27 26 25 24 23 22 21 20 19 18 17 47 48 46 45 44 43 42 41 40 39 38 37 36 35 34 33 63 64 62 61 60 59 58 57 56 55 54 53 52 51 50 49

17 Amon 15 16 14 13 12 11 10 9 8 7 6 5 4 3 2 1 31 32 30 29 28 27 26 25 24 23 22 21 20 19 18 17 47 48 46 45 44 43 42 41 40 39 38 37 36 35 34 33 63 64 62 61 60 59 58 57 56 55 54 53 52 51 50 49

17 Amon 15 16 14 13 12 11 10 9 8 7 6 5 4 3 2 1 31 32 30 29 28 27 26 25 24 23 22 21 20 19 18 17 47 48 46 45 44 43 42 41 40 39 38 37 36 35 34 33 63 64 62 61 60 59 58 57 56 55 54 53 52 51 50 49

17 Amon 15 16 14 13 12 11 10 9 8 7 6 5 4 3 2 1 31 32 30 29 28 27 26 25 24 23 22 21 20 19 18 17 47 48 46 45 44 43 42 41 40 39 38 37 36 35 34 33 63 64 62 61 60 59 58 57 56 55 54 53 52 51 50 49

17 Amon 15 16 14 13 12 11 10 9 8 7 6 5 4 3 2 1 31 32 30 29 28 27 26 25 24 23 22 21 20 19 18 17 47 48 46 45 44 43 42 41 40 39 38 37 36 35 34 33 63 64 62 61 60 59 58 57 56 55 54 53 52 51 50 49

18 Amon: Routing 15 16 14 13 12 11 10 9 8 7 6 5 4 3 2 1 31 32 30 29 28 27 26 25 24 23 22 21 20 19 18 17 47 48 46 45 44 43 42 41 40 39 38 37 36 35 34 33 63 64 62 61 60 59 58 57 56 55 54 53 52 51 50 49

18 Amon: Routing 15 16 14 13 12 11 10 9 8 7 6 5 4 3 2 1 31 32 30 29 28 27 26 25 24 23 22 21 20 19 18 17 47 48 46 45 44 43 42 41 40 39 38 37 36 35 34 33 63 64 62 61 60 59 58 57 56 55 54 53 52 51 50 49

18 Amon: Routing 15 16 14 13 12 11 10 9 8 7 6 5 4 3 2 1 31 32 30 29 28 27 26 25 24 23 22 21 20 19 18 17 47 48 46 45 44 43 42 41 40 39 38 37 36 35 34 33 63 64 62 61 60 59 58 57 56 55 54 53 52 51 50 49 λ10 λ10 λ10

18 Amon: Routing 15 16 14 13 12 11 10 9 8 7 6 5 4 3 2 1 31 32 30 29 28 27 26 25 24 23 22 21 20 19 18 17 47 48 46 45 44 43 42 41 40 39 38 37 36 35 34 33 63 64 62 61 60 59 58 57 56 55 54 53 52 51 50 49 λ10 λ10 λ10 λ10 λ10

18 Amon: Routing 15 16 14 13 12 11 10 9 8 7 6 5 4 3 2 1 31 32 30 29 28 27 26 25 24 23 22 21 20 19 18 17 47 48 46 45 44 43 42 41 40 39 38 37 36 35 34 33 63 64 62 61 60 59 58 57 56 55 54 53 52 51 50 49 λ10 λ10 λ10 λ10 λ10 λ16 λ16 λ16 λ16 λ16 λ16 λ16

18 Amon: Routing 15 16 14 13 12 11 10 9 8 7 6 5 4 3 2 1 31 32 30 29 28 27 26 25 24 23 22 21 20 19 18 17 47 48 46 45 44 43 42 41 40 39 38 37 36 35 34 33 63 64 62 61 60 59 58 57 56 55 54 53 52 51 50 49 λ10 λ10 λ10 λ10 λ10 λ16 λ16 λ16 λ16 λ16 λ16 λ16 λ16 λ16 λ16

19 15 16 14 13 12 11 10 9 8 7 6 5 4 3 2 1 31 32 30 29 28 27 26 25 24 23 22 21 20 19 18 17 47 48 46 45 44 43 42 41 40 39 38 37 36 35 34 33 63 64 62 61 60 59 58 57 56 55 54 53 52 51 50 49 Contention-free Routing

19 15 16 14 13 12 11 10 9 8 7 6 5 4 3 2 1 31 32 30 29 28 27 26 25 24 23 22 21 20 19 18 17 47 48 46 45 44 43 42 41 40 39 38 37 36 35 34 33 63 64 62 61 60 59 58 57 56 55 54 53 52 51 50 49 Contention-free Routing λ8

19 15 16 14 13 12 11 10 9 8 7 6 5 4 3 2 1 31 32 30 29 28 27 26 25 24 23 22 21 20 19 18 17 47 48 46 45 44 43 42 41 40 39 38 37 36 35 34 33 63 64 62 61 60 59 58 57 56 55 54 53 52 51 50 49 Contention-free Routing λ8 λ8 λ8 λ8 λ8 λ8 λ8

19 15 16 14 13 12 11 10 9 8 7 6 5 4 3 2 1 31 32 30 29 28 27 26 25 24 23 22 21 20 19 18 17 47 48 46 45 44 43 42 41 40 39 38 37 36 35 34 33 63 64 62 61 60 59 58 57 56 55 54 53 52 51 50 49 Contention-free Routing λ8 λ8 λ8 λ8 λ8 λ8 λ8 λ8 λ8 λ8 λ8 λ8 λ8 λ8

19 15 16 14 13 12 11 10 9 8 7 6 5 4 3 2 1 31 32 30 29 28 27 26 25 24 23 22 21 20 19 18 17 47 48 46 45 44 43 42 41 40 39 38 37 36 35 34 33 63 64 62 61 60 59 58 57 56 55 54 53 52 51 50 49 Contention-free Routing λ8 λ8 λ8 λ8 λ8 λ8 λ8 λ8 λ8 λ8 λ8 λ8 λ8 λ8 λ8 λ8 λ8 λ8 λ8 λ8 λ8

Switch Architecture Other Switches are designed accordingly 20

21 36 Node Amon 9 8 7 6 5 4 3 2 1 18 17 16 15 14 13 12 11 10 27 26 25 24 23 22 21 20 19 36 35 34 33 32 31 30 29 28

22 12 11 10 8 7 6 4 3 2 22 21 20 18 17 16 14 13 12 35 34 33 31 30 29 27 26 25 47 46 44 42 41 40 38 37 36 23 19 15 48 43 39 9 5 1 32 28 24 48 Node Amon Scaling Symmetrical to X/Y Axis

Diameter 23

Diameter Much smaller diameter with better scalability -> shorter paths -> less laser power 23

Design Configuration Aim: Low-power design, parameters are accordingly: 22nm low-voltage technology library Core data rate: 4Ghz Modulator/Detector: 8Gb/s Flit Size: 16bit Standard Laser type: Laser is always on Tile-width: 1mm Injection rate 0.5 Data is modulated on 8 wavelengths per sender Control network: Multi-Write-Single-Read Bus Implementation with DSENT 1 network modeling tool 64-, 144- and 256-Node networks to assess scalability. 1 C. Sun et al., Dsent - a tool connecting emerging photonics with electronics for opto-electronic networks-on-chip modeling, in NOCS, 2012. IEEE, 2012, pp. 201 210. 24

Number of Microrings Microrings: Modulators, Detectors, Filters #Microrings + 54% Savings + 33% 25

Number of Microrings Microrings: Modulators, Detectors, Filters #Microrings #Microrings #Microrings + 52% + 54% Savings Savings + 50% Savings + 33% + 29% + 26% 25

Number of Microrings Microrings: Modulators, Detectors, Filters #Microrings #Microrings #Microrings + 52% + 54% Savings Savings + 50% Savings + 33% + 29% + 26% Up to 54% savings in microrings! 25

Area Results 31% Savings 18% 26

Area Results 31% Savings 18% 30% Savings 16% 29% Savings 14% 26

Power Consumption 64 Nodes 27

Power Consumption 52% Savings 39% 64 Nodes 27

Power Consumption 52% Savings 70% 39% Savings 60% 78% Savings 71% 64 Nodes 144 Nodes 256 Nodes 27

Summary Amon is a novel mesh-based optical NoC comprising topology, switch architecture and routing algorithm 28

Summary Amon is a novel mesh-based optical NoC comprising topology, switch architecture and routing algorithm Compared to ring-based Spidergon and QuT, Amon saves: Laser Power: Short paths -> lower path losses N/4 Wavelengths in Network Ring Heater Power: Fewer Ring filters for switching -> less ring tuning required Total Power Savings up to 78% / 71% Area due to fewer microrings (up to 31% / 18%) Mesh Structure suitable for tile-based VLSI implementation 28

Thank you! Questions? 29

Zero Load Latency Control Network: Packet Size 2bit for packet type (req/ack/nack) 4Ghz Core clk and 8Gb/s Modulator: 2 bits per clock clk Total latency: Modulation (1 cycle) + On-the-fly (1 cycle) + Detection (1 cycle) = 3 cycles Destination checking: 6 cycles (req + ack) 30

Zero Load Latency Control Network: Packet Size 2bit for packet type (req/ack/nack) 4Ghz Core clk and 8Gb/s Modulator: 2 bits per clock clk Total latency: Modulation (1 cycle) + On-the-fly (1 cycle) + Detection (1 cycle) = 3 cycles Destination checking: 6 cycles (req + ack) Data Network: Assuming 128bit data packet Data transmission with 8 modulators: 128 / 8 / 2 = 8 cycles for modulation, 1 on-the-fly, 8 for detection -> 17 cycles Total: 23 Cycles 30

Zero Load Latency Control Network: Packet Size 2bit for packet type (req/ack/nack) 4Ghz Core clk and 8Gb/s Modulator: 2 bits per clock clk Total latency: Modulation (1 cycle) + On-the-fly (1 cycle) + Detection (1 cycle) = 3 cycles Destination checking: 6 cycles (req + ack) Data Network: Assuming 128bit data packet Data transmission with 8 modulators: 128 / 8 / 2 = 8 cycles for modulation, 1 on-the-fly, 8 for detection -> 17 cycles Total: 23 Cycles with 200ps clock cycle and 15ps/mm propagation delay, every destination within 18 hops is reached in one clock cycle -> Larger network size has insignificant impact on latency Adding modulators or using faster ones (up to 40Gb have been fabricated) further decreases latency 30

Insertion Loss Parameters 31

Control Network MWSR Power: 21%, 19%, and 17% of Amon (64, 144, 256 Nodes) Only 1 Modulator compared to 8 leads to small ring heater power and area Waveguide Area becomes significant as one waveguide reaching to every other node in the onoc is added for each node 32

Control Network 33

Control Network Req - Ack/NegAck messages for destination reservation 33

Control Network Req - Ack/NegAck messages for destination reservation Commonly implemented as a Multiple-Write-Single-Read bus 33

Technology Parameters Area Waveguide->Pitch = 4e-6 # m Ring->Area = 100e-12 # m2 Photodetector->Area = 10e-12 # m2 34

Power Consumption Amon total power : 64 Nodes: 0.83W 144 Nodes: 4W 256 Nodes: 15W 35

Area Results 36

Area Results mm 2 36

Area Results mm 2 mm 2 36

Area Results mm 2 mm 2 mm 2 36

Power Consumption WATTS 64 Nodes 37

Power Consumption WATTS WATTS 64 Nodes 144 Nodes 37

Power Consumption WATTS WATTS WATTS 64 Nodes 144 Nodes 256 Nodes 37

VLSI Layout: Shared Laser Sources Laser Sources Coupler Splitter 38

VLSI Layout: Shared Laser Sources 39

40

40

41

42

Amon: Evaluation & Comparison Microring area (m 2 ) Waveguide area (m 2 ) Total area normalized to Amon For comparison: enoc 64-node Mesh: Area: 1.77e-06 (~ 40% of Amon) 43

QuT 13 14 12 15 11 16 10 1 9 2 8 3 7 4 6 5 4 injection channels for destinations in < N/4 (left/right) > N/4 (left/right) hop distance N/4 wavelengths in network -> less switching rings -> Same #modulators at each node But: Ring topology causes long paths leading to high IL 44