Out of order execution allows

Similar documents
Emerging Memory Technologies

High Performance Microprocessor Design and Automation: Overview, Challenges and Opportunities IBM Corporation

CSCI 120 Introduction to Computation Bits... and pieces (draft)

Impact of Intermittent Faults on Nanocomputing Devices

ESE534: Computer Organization. Today. Image Processing. Retiming Demand. Preclass 2. Preclass 2. Retiming Demand. Day 21: April 14, 2014 Retiming

Day 21: Retiming Requirements. ESE534: Computer Organization. Relative Sizes. Today. State. State Size

Optimized Magnetic Flip-Flop Combined With Flash Architecture for Memory Unit Based On Sleep Transistor

Digital Integrated Circuits EECS 312

Modified Generalized Integrated Interleaved Codes for Local Erasure Recovery

ESE (ESE534): Computer Organization. Last Time. Today. Last Time. Align Data / Balance Paths. Retiming in the Large

Spec Sheet R&S SpycerBox family

Digital Integrated Circuits EECS 312. People. Exams. Purpose of Course and Course Objectives I. Grading philosophy. Grading and written feedback

Digital Integrated Circuits EECS 312. Review. Remember the ENIAC? IC ENIAC. Trend for one company. First microprocessor

Chapter 1: Data Storage. Copyright 2015 Pearson Education, Inc.

VLSI Design: 3) Explain the various MOSFET Capacitances & their significance. 4) Draw a CMOS Inverter. Explain its transfer characteristics

PRACE Autumn School GPU Programming

SECONDARY STORAGE DEVICES: MAGNETIC TAPES AND CD-ROM

EECS150 - Digital Design Lecture 2 - CMOS

Data Storage and Manipulation

SoC IC Basics. COE838: Systems on Chip Design

A Low-Power 0.7-V H p Video Decoder

Lecture 1: Circuits & Layout

Hardware Design I Chap. 5 Memory elements

Lossless Compression Algorithms for Direct- Write Lithography Systems

Full Disclosure Monitoring

(Refer Slide Time: 2:00)

AE16 DIGITAL AUDIO WORKSTATIONS

Joint Rewriting and Error Correction in Flash Memories

EECS150 - Digital Design Lecture 12 - Video Interfacing. Recap and Outline

11. Sequential Elements

Video-on-Demand. Nick Caggiano Walter Phillips

Using Embedded Dynamic Random Access Memory to Reduce Energy Consumption of Magnetic Recording Read Channel

Register Transfer Level (RTL) Design Cont.

UC Berkeley CS61C : Machine Structures

Wideband Downconverters With Signatec 14-Bit Digitizers

Integrated Circuit Design ELCT 701 (Winter 2017) Lecture 1: Introduction

Layout Decompression Chip for Maskless Lithography

Digital Integrated Circuits EECS 312

Implementation of an MPEG Codec on the Tilera TM 64 Processor

On the Rules of Low-Power Design

Spec Sheet R&S SpycerBox Cell

For Teacher's Use Only Q Total No. Marks. Q No Q No Q No

Timing Error Detection: An Adaptive Scheme To Combat Variability EE241 Final Report Nathan Narevsky and Richard Ott {nnarevsky,

MMI: A General Narrow Interface for Memory Devices

Example: compressing black and white images 2 Say we are trying to compress an image of black and white pixels: CSC310 Information Theory.

SLD100/120/200/220. Solid State Drive based long time delay unit with optional second output and bug inserter ALL RIGHTS RESERVED

Chapter 7 Memory and Programmable Logic

Sequencing. Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Fall,

LCOS-SLM (Liquid Crystal on Silicon - Spatial Light Modulator)

Lecture 1: Intro to CMOS Circuits

AN EFFECTIVE CACHE FOR THE ANYWHERE PIXEL ROUTER

Timing Error Detection and Correction for Reliable Integrated Circuits in Nanometer Technologies

HD SDI Cameras. = XON Tri-brid NDVR

Data Dissemination and Broadcasting Systems Lesson 05 Data Dissemination Broadcast-disk Models

Considerations for Specifying, Installing and Interfacing Rotary Incremental Optical Encoders

CDA 4253 FPGA System Design FPGA Architectures. Hao Zheng Dept of Comp Sci & Eng U of South Florida

CUFPOS402A. Information Technology for Production. Week Two:

The ATLAS Tile Calorimeter, its performance with pp collisions and its upgrades for high luminosity LHC

THE Collider Detector at Fermilab (CDF) [1] is a general

Compressed-Sensing-Enabled Video Streaming for Wireless Multimedia Sensor Networks Abstract:

IEEE Santa Clara ComSoc/CAS Weekend Workshop Event-based analog sensing

Technical Newsletter

Spatial Light Modulators XY Series

Macronix OctaFlash Serial NOR Flash White Paper

VVD: VCR operations for Video on Demand

LH28F320S3TD-L M-bit (2 MB x 8/1 MB x 16 x 2-Bank) Smart 3 Dual Work Flash Memory DESCRIPTION FEATURES LH28F320S3TD-L10

Combinational vs Sequential

Slide Set 8. for ENCM 501 in Winter Term, Steve Norman, PhD, PEng

Network. Decoder. Display

FP 12.4: A CMOS Scheme for 0.5V Supply Voltage with Pico-Ampere Standby Current

Lecture 2: Digi Logic & Bus

CS 152 Computer Architecture and Engineering

8 DIGITAL SIGNAL PROCESSOR IN OPTICAL TOMOGRAPHY SYSTEM

Conceps and trends for Front-end chips in Astroparticle physics

Achieving 10 Terabytes/Cartridge by 2011

LH28F160S3-L/S3H-L. 16 M-bit (2 MB x 8/1 MB x 16) Smart 3 Flash Memories (Fast Programming) DESCRIPTION FEATURES LH28F160S3-L/S3H-L

M89 FAMILY In-System Programmable (ISP) Multiple-Memory and Logic FLASH+PSD Systems for MCUs

LH28F160SGED-L M-bit (512 kb x 16 x 2-Bank) SmartVoltage Dual Work Flash Memory DESCRIPTION FEATURES LH28F160SGED-L10

SEMICONDUCTOR TECHNOLOGY -CMOS-

DESIGN OF NOVEL ADDRESS DECODERS AND SENSE AMPLIFIER FOR SRAM BASED memory

SPATIAL LIGHT MODULATORS

THE USE OF forward error correction (FEC) in optical networks

Fluke 190-Series II Firmware Upgrade V11.44

Microcontrollers and Interfacing week 7 exercises

Static Timing Analysis for Nanometer Designs

Self-Aligned Double Patterning for 3xnm Flash Production

Description. Kingbright

NAS vs. SAN: Storage Considerations for Broadcast and Post- Production Applications

An MFA Binary Counter for Low Power Application

Co-location of PMP 450 and PMP 100 systems in the 900 MHz band and migration recommendations

Technical Note. Migrating from Micron M29EW Devices to MT28EW NOR Flash Devices. Introduction. TN-13-37: Migrating M29EW to MT28EW NOR Flash Devices

Optimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015

(51) Int Cl.: H04L 1/00 ( )

WHAT IS THE FUTURE OF TAPE TECHNOLOGY FOR DATA STORAGE AND MANAGEMENT?

Status of the X-ray FEL control system at SPring-8

Parallelization of Multimedia Applications by Compiler on Multicores for Consumer Electronics

SEMICONDUCTOR TECHNOLOGY -CMOS-

Radiology Physics Lectures: Computers. Associate Professor, Radiology x d

Optimization of Multi-Channel BCH. Error Decoding for Common Cases. Russell Dill

LH28F800SG-L/SGH-L (FOR TSOP, CSP)

Transcription:

Out of order execution allows Letter A B C D E Answer Requires extra stages in the pipeline The processor to exploit parallelism between instructions. Is used mostly in handheld computers A, B, and C A and B 1

CMP is short for Letter A B C D E Answer Compare Common mode parallelism Cache Mostly Programs Chip Multiprocessor Concurrent Machine Programming 2

Coherence and consistency affect Letter A B C D E Answer The order in which memory operations take affect How your food tastes. How an OOO processor can execute add instructions The number of cores that can be in a CMP The depth of a processors pipeline. 3

The final is Letter A B C D E Answer Only about topics covered on evennumbered days of the course All multiple choice, and the answers are all B Comprehensive Similar in format to the midterm C and D 4

CAPEs Letter A B C D E Answer Are for super heroes. Are open until the start of finals week Are very important. All of the above. None of the above. 5

TA Evaluations and CAPE Please fill out your TA evaluations You should have received a link to do so. CAPEs are also open. Clicker evaluation: https://www.surveymonkey.com/s/cse141sp14_s wanson 6

Final review Come with questions Next Tuesday (the lecture will probably be brief) Next Thursday The final is comprehensive Look over the slides, homeworks, quizzes, and midterm 6/10/2014 8:00am-11:00am in this room

Storage Steven Swanson

Humanity processed 9 Zettabytes in 2008* Welcome to the Data Age! *http://hmi.ucsd.edu 9

Solid State Memories NAND flash Ubiquitous, cheap Sort of slow, idiosyncratic Phase change, Spin torque MRAMs, etc. On the horizon DRAM-like speed DRAM or flash-like density 10

Bandwidth Relative to disk 100000 10000 1000 5917x 2.4x/yr PCIe-PCM (2010) PCIe-Flash (2012) DDR Fast NVM (2016?) 100 10 Hard Drives (2006) PCIe-Flash (2007) PCIe-PCM (2013?) 7200x 2.4x/yr 1 1 10 100 1000 10000 100000 1000000 100000 1/Latency Relative To Disk 11

Disk Density 1 Tb/sqare inch 12 1

Hard drive Cost Today at newegg.com: $0.04 GB ($0.00004/MB) Desktop, 2 TB 13 1

Why Are Disks Slow? They have moving parts :-( The disk itself and the a head/arm The head can only read at one spot. High end disks spin at 15,000 RPM Data is, on average, 1/2 an revolution away: 2ms Power consumption limits spindle speed Why not run it in a vacuum? The head has to position itself over the right track Currently about 150,000 tracks per inch. Positioning must be accurate with about 175nm Takes 3-13ms 14 1

Making Disks Faster Caching Everyone tries to cache disk accesses! The OS The disk controller The disk itself. Access scheduling Reordering accesses can reduce both rotational and seek latencies 15 1

RAID! Redundant Array of Independent (Inexpensive) Disks If one disk is not fast enough, use many Multiplicative increase in bandwidth Multiplicative increase in Ops/Sec Not much help for latency. If one disk is not reliable enough, use many. Replicate data across the disks If one of the disks dies, use the replica data to continue running and re-populate a new drive. Historical foot note: RAID was invented by one of the text book authors (Patterson) 16 1

RAID Levels There are several ways of ganging together a bunch of disks to form a RAID array. They are called levels Regardless of the RAID level, the array appears to the system as a sequence of disk blocks. The levels differ in how the logical blocks are arranged physically and how the replication occurs. 17 1

RAID 0 Double the bandwidth. For an n-disk array, the n- th block lives on the n-th disk. Worse for reliability If one of your drives dies, all your data is corrupt-- you have lost every nth block. 18 1

RAID 1 Mirror your data 1/2 the capacity But, you can tolerate a disk failure. Double the bandwidth for reads Same bandwidth for writes. 19 1

Stripe your data across a bunch of disks Use one bit to hold parity information The number of 1 s at corresponding locations across the drives is always even. If you lose on drive, you can reconstruct it from the others. Read and write all the disks in parallel. 20 2

The Flash Juggernaut

Flash is Fast! Hard Drives PCIe-Flash 2007 Lat.: 7.1ms BW: 2.6MB/s 1x 1x 68us 250MB/s 104x 96x Random 4KB Reads from user space

Floating Gate Flash Operations Read 0V 1V 5V 0V 20V Program 20V Erase 0V 0V

Organizing Flash Cells into Chips

Organizing Flash Cells into Chips ~16K blocks/chip ~16-64Gbits/chip

Flash Operations Page: 0 1 2 3 4 n-4 n-3 n-2 n-1 n Block 0 SLC: Single Level Cell Block 1 == 1 bit Block 2 Block n MLC: Multi Level Cell Erase Blocks Program Pages == 2 bits TLC: Triple Level Cell == 3 bits

Single-Level Cell Endurance: 100,000 Cycles Data retention: 10 years Read Latency: 25us Program Latency: 100-200us == 1 bit

Multi-Level Cell (2 bits) Endurance: 5000-10,000 Cycles Data retention: 3-10 years Read Latency: 25-37us Program Latency: 600-1800us == 2 bits

Triple-level Cell (3bits) Endurance: ~500-1000 Cycles Data retention: 3 years Read Time: 60-120us Program Time: 500-6500us == 3 bits

3D Nand SLC, MLC, and TLC NAND cells are 4F 2 devices. 1.33 4F 2 per bit Higher densities require 3D designs Samsung has demonstrated 24 layers 2-4x density boost http://bcove.me/xz2o1af5

Flash Failure Mechanisms Program/Erase (PE) Wear Permanent damaged to the gate oxide at each flash cell Caused by high program/erase voltages Damage causes charge to leak off the floating gate Program disturb Data corruption caused by interference from programming adjacent cells. No permanent damage

Making Disks out Flash Chips Read Pages Write Pages Erase Blocks Hierarchical addresses PE Wear Read Write Flat address space No wear limitations

Writing Data SSD Maintain a map between virtual logical block addresses and physical flash locations.

Writing more data When you overwrite data, it goes to a new location.

Flash Translation Layer (FTL) Software FTL Flash User Logical Block Address Flash Write pages in order Erase/Write granularity Wears out FTL Logical Physical map Wear leveling Power cycle recovery

Centralized FTL State Map Write Point LBA Physical Page Address 0 Block 5 Page 7 2k Block 27 Page 0 4k Block 10 Page 2 101001011010001 010100100101011 101010110101001 111111111111111 111111111111111 111111111111111 Block Info Table Next Sequence Number: 12 Block Erased Erase Count Valid Page Count Sequence Number Bad Block Indicator 0 False 3 15 5 False 1 True 7 0 - False 2 False 0 4 9 False

Read Software 1. Read Data at LBA 2k 2. Map FTL Flash LBA Physical Page Address 0 Block 5 Page 7 2k Block 27 Page 0 4k Block 10 Page 2 3. Flash Operation

Write Mid Block Write 0101101011001010 to LBA 2k Write Point = Block 2, Page 5 Map LBA Physical Page Address 0 Block 5 Page 7 2k Block 0 Page 0 4k Block 10 Page 2 1010010111010101 0101001010111011 1010101101001010 Block Info Table Block Erased Erase Count Valid Page Count Next Sequence Number: 12 Sequence Number Bad Block Indicator 0 False 3 15 5 False 1 True 7 0 - False 2 False 0 4 9 False

Write Write 0101101011001010 to LBA 2k Map LBA Physical Page Address 0 Block 5 Page 7 2k Block 0 2 Page 0 5 4k Block 10 Page 2 Write Point = Block 2, Page 5 Write Point = Block 2, Page 6 1010010111010101 0101001010111011 1010101101001010 0101101011001010 Block Info Table Block Erased Erase Count Valid Page Count Next Sequence Number: 12 Sequence Number Bad Block Indicator 0 False 3 15 14 5 False 1 True 7 0 - False 2 False 0 4 5 9 False

Block Info Table Block Erased Erase Count Erase Valid Page Count Sequence Number Bad Block Indicator 0 False 3 13 5 False 1 False 7 1 12 False 2 False 0 3 9 False Move Valid Pages Block 2 0101011010101010 1010001010111010 0101011010010101 0101110100101000 1101000101101001 0101011010100111 0101110100010110 1011101000101010 1010010111010101 0101001010111011 1010101101001010

Block Info Table Block Erased Erase Count Erase Valid Page Count Sequence Number Bad Block Indicator 0 False 3 13 5 False 1 False 7 1 12 False 2 False 0 3 0 9 False Move Valid Pages Block 2 0101011010101010 1010001010111010 0101011010010101 0101110100101000 1101000101101001 0101011010100111 0101110100010110 1010010111010101 0101001010111011 1010101101001010 1010001010111010 1101000101101001 0101011010100111 Update: Map Valid Pg Counts etc. 1011101000101010

Block Info Table Block Erased Erase Count Erase Valid Page Count Sequence Number Bad Block Indicator 0 False 3 13 5 False 1 False 7 1 12 False 2 F T 01 0 - False Move Valid Pages Block 2 1010010111010101 0101001010111011 1010101101001010 1010001010111010 1101000101101001 0101011010100111 Update: Map Valid Pg Counts etc.