Performance. Performance

Similar documents
OUT-OF-ORDER processors with precise exceptions

Slack Redistribution for Graceful Degradation Under Voltage Overscaling

Exploring Architecture Parameters for Dual-Output LUT based FPGAs

THE BaBar High Energy Physics (HEP) detector [1] is

Sharif University of Technology. SoC: Introduction

The Design of Efficient Viterbi Decoder and Realization by FPGA

Design for Testability

Design of Polar List Decoder using 2-Bit SC Decoding Algorithm V Priya 1 M Parimaladevi 2

Design Project: Designing a Viterbi Decoder (PART I)

EN2911X: Reconfigurable Computing Topic 01: Programmable Logic. Prof. Sherief Reda School of Engineering, Brown University Fall 2014

A video signal processor for motioncompensated field-rate upconversion in consumer television

CHAPTER 6 ASYNCHRONOUS QUASI DELAY INSENSITIVE TEMPLATES (QDI) BASED VITERBI DECODER

data and is used in digital networks and storage devices. CRC s are easy to implement in binary

Optimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015

Achieving Faster Time to Tapeout with In-Design, Signoff-Quality Metal Fill

ECE 555 DESIGN PROJECT Introduction and Phase 1

Milestone Solution Partner IT Infrastructure Components Certification Report

Asynchronous IC Interconnect Network Design and Implementation Using a Standard ASIC Flow

Lecture 18 Design For Test (DFT)

CPS311 Lecture: Sequential Circuits

Self-Test and Adaptation for Random Variations in Reliability

Investigation of Look-Up Table Based FPGAs Using Various IDCT Architectures

For the SIA. Applications of Propagation Delay & Skew tool. Introduction. Theory of Operation. Propagation Delay & Skew Tool

Performance Driven Reliable Link Design for Network on Chips

Avoiding False Pass or False Fail

SIMULATION OF PRODUCTION LINES INVOLVING UNRELIABLE MACHINES; THE IMPORTANCE OF MACHINE POSITION AND BREAKDOWN STATISTICS

Scalability of MB-level Parallelism for H.264 Decoding

Example: compressing black and white images 2 Say we are trying to compress an image of black and white pixels: CSC310 Information Theory.

Text with EEA relevance. Official Journal L 036, 05/02/2009 P

Frame Processing Time Deviations in Video Processors

OEM Basics. Introduction to LED types, Installation methods and computer management systems.

Hardware Implementation of Viterbi Decoder for Wireless Applications

Low Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Design of Fault Coverage Test Pattern Generator Using LFSR

Metastability Analysis of Synchronizer

MARGINS ON SUBMARINE SYSTEMS

from ocean to cloud ADAPTING THE C&A PROCESS FOR COHERENT TECHNOLOGY

ESE (ESE534): Computer Organization. Last Time. Today. Last Time. Align Data / Balance Paths. Retiming in the Large

Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation

Topic D-type Flip-flops. Draw a timing diagram to illustrate the significance of edge

Retiming Sequential Circuits for Low Power

Design for Test. Design for test (DFT) refers to those design techniques that make test generation and test application cost-effective.

VLSI Design: 3) Explain the various MOSFET Capacitances & their significance. 4) Draw a CMOS Inverter. Explain its transfer characteristics

SIMULATION OF PRODUCTION LINES THE IMPORTANCE OF BREAKDOWN STATISTICS AND THE EFFECT OF MACHINE POSITION

Lossless Compression Algorithms for Direct- Write Lithography Systems

VLSI Test Technology and Reliability (ET4076)

From Synchronous to Asynchronous Design

Combinational vs Sequential

Cascadable 4-Bit Comparator

FPGA Development for Radar, Radio-Astronomy and Communications

SoC IC Basics. COE838: Systems on Chip Design

Placement Rent Exponent Calculation Methods, Temporal Behaviour, and FPGA Architecture Evaluation. Joachim Pistorius and Mike Hutton

Testing Digital Systems II

Power Reduction Techniques for a Spread Spectrum Based Correlator

AE16 DIGITAL AUDIO WORKSTATIONS

Figure.1 Clock signal II. SYSTEM ANALYSIS

VLSI System Testing. BIST Motivation

Compressed Air Management Systems SIGMA AIR MANAGER Pressure flexibility Switching losses Control losses next.

Figure 1 shows a simple implementation of a clock switch, using an AND-OR type multiplexer logic.

Time Domain Simulations

Innovative Fast Timing Design

Precision testing methods of Event Timer A032-ET

ECE321 Electronics I

Considerations for Specifying, Installing and Interfacing Rotary Incremental Optical Encoders

Vicon Valerus Performance Guide

The Art of Low-Cost IoT Solutions

Reducing Pipeline Energy Demands with Local DVS and Dynamic Retiming

Good afternoon! My name is Swetha Mettala Gilla you can call me Swetha.

EECS150 - Digital Design Lecture 10 - Interfacing. Recap and Topics

VLSI Chip Design Project TSEK06

Laboratory 1 - Introduction to Digital Electronics and Lab Equipment (Logic Analyzers, Digital Oscilloscope, and FPGA-based Labkit)

ESE534: Computer Organization. Today. Image Processing. Retiming Demand. Preclass 2. Preclass 2. Retiming Demand. Day 21: April 14, 2014 Retiming

CHARACTERIZATION OF END-TO-END DELAYS IN HEAD-MOUNTED DISPLAY SYSTEMS

A Fast Constant Coefficient Multiplier for the XC6200

FIFO Memories: Solution to Reduce FIFO Metastability

Chapter 7 Memory and Programmable Logic

High Performance Carry Chains for FPGAs

Processor time 9 Used memory 9. Lost video frames 11 Storage buffer 11 Received rate 11

Lehrstuhl für Informatik 4 Kommunikation und verteilte Systeme

The Syscal family of resistivity meters. Designed for the surveys you do.

Evaluating Oscilloscope Mask Testing for Six Sigma Quality Standards

ECE 715 System on Chip Design and Test. Lecture 22

Overview: Logic BIST

11. Sequential Elements

Zebra2 (PandA) Functionality and Development. Isa Uzun and Tom Cobb

More on Flip-Flops Digital Design and Computer Architecture: ARM Edition 2015 Chapter 3 <98> 98

Failure Modes, Effects and Diagnostic Analysis

Logic Devices for Interfacing, The 8085 MPU Lecture 4

Design and FPGA Implementation of 100Gbit/s Scrambler Architectures for OTN Protocol Chethan Kumar M 1, Praveen Kumar Y G 2, Dr. M. Z. Kurian 3.

Timing Error Detection: An Adaptive Scheme To Combat Variability EE241 Final Report Nathan Narevsky and Richard Ott {nnarevsky,

Benchtop Portability with ATE Performance

High Quality Digital Video Processing: Technology and Methods

Random Access Scan. Veeraraghavan Ramamurthy Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL

Efficient Trace Signal Selection for Post Silicon Validation and Debug

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath

Contents Circuits... 1

Chapter 5 Flip-Flops and Related Devices

Business Case for CloudTV


140 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 12, NO. 2, FEBRUARY 2004

Transcription:

2 What do you mean by performance? The importance metric is how does your application perform? How does your mix of applications perform? Speed. is 0.1 seconds different from 0.5 seconds 1% chance of response exceeding 2 seconds total throughput or individual latency. Cost. Time do we need to train staff, or hire extra staff can it be installed in 6 weeks. Acceptable? Speed: rendering images. A system of 100 cores which renders images such that each core takes 5 seconds to render one image. Average throughput is 0.05 seconds. Good for rendering a movie, useless for a real time computer game. User wants response. conflicts Provider throughput Acceptable?

3 Analysing* What do you do with the measures? Much of statistical analysis assumes Gaussian. But computer responses may not be Gaussian. Careful interpretation of the data A good assumption. Gaussian measurement Mean is a sensible measure of behaviour. Sigma gives a good measure of width. Enough data to draw robust conclusions. Are your results repeatable? Even here some asymmetry. Significant? Extrapolation Remember uncertainties must also be propagated. New effects may occur. Things are not always linear. Likely range

4 MTTF What does this mean? Mean Time To Failure Mean Time To Repair MTTF MTTR MTTF Disk lifetime 1,200,000 hours 43,000 hours Measured by taking a large number of disks say 10,000 and running say 2400 hours (4 months) and count the failures. MTTF = # of hours run = 10,000*2400 = 1,200,000 # of failed disks 20 So 1 disk running for 1 year has a 43,800 = 0.9% 1,200,000 Chance of failing or around 4½% over its lifetime. All disks in lifetime Failures are correlated manufacturing fault. or environmental insult Backup Last year at RAL disk failures every dew days Brunel Grid node motherboards

5 Metrics Measurements of performance = 1 Time Time: Elapsed time/response time/wall clock time Time from submission to retrieval. Do you include network transfer? Queuing? Terms CPU time Wall time Clock period Clock rate MHz/GHz CPI CPU (execution) time Clock cycles (ticks) CPU Time for a programme system user = 1/clock period. = CPU clock cycles X Clock period for programme There is a design trade off powerful instruction sets take fewer instructions per programme, but more time per instruction. The average number of cycles per instruction is referred to as clock cycles per instruction (CPI) Time = Instructions/program * CPI * Clock period Alternative expression for execution time Constraints Hard real time constraint. A fly by wire system must respond in a maximum time. Soft real time constraint. ipod playback must return the stream within a maximum time, most of the time. Ignores latency from any cause

6 Non Gaussian How do you deal with non Gaussian? Display the full results. This may be the only way. Give the full range : minimum to maximum Give the 90% range about some suitable point. mean, mode, median, from smallest, from largest, from 5%-95% Compare with a model and give the model parameters (plus errors). Two Gaussians. What are the results for.. Don t just give the mean!

7 Spec Standard benchmarks Spec Industry standard set of benchmarks. Measures amount of time to finish a task. New version produced every few years. Spec CPU92, CPU95, CPU2000, CPU2006. 1.Because the performance increases and if we didn t the times for some tasks would become so small as to be meaningless. 2.Nature of a suitable set of tasks changes 3.Manufacturers tune their machines and compilers to perform well on benchmarks. Review to ensure they continue to provide a real measure of performance It will be misleading Set of tasks, meant to reflect the real world typical mix of tasks. Weighting also meant to reflect real world weighting..

8 Summary A single number. Execution time on a number of different programs. What to use? Arithmetic average of execution time of all programs? They vary in speed implicit weighting. Explicit weight but the mix is supposed to be representative. Weighting would encourage companies to reweight. SPECRatio: Normalize execution times to reference computer Ratio = time on reference computer time on computer being rated Note ratio machines A and B. SpecRatio(A) = 1.25*SpecRatio(B) 1.25 = SpecRatio(A) = Time on Ref / Time on Ref SpecRatio(B) Time on A / Time on B = Time on B Time on A Actual ref machine is unimportant

9 Summary Summary How to aggregate the ratios of the different programs? GeometricM ean = n Õ n SP ECRatio i i= 1 Geometric mean of the ratios is the same as the ratio of the geometric means. Again choice of computer is irrelevant A B A/B 1.3 1.2 1.083333 2.2 2.1 1.047619 0.9 1 0.9 1.7 1.75 0.971429 0.8 0.75 1.066667 1.1 1.1 1 0.85 0.8 1.0625 SpecRatio for different programs ratio for different programs 1.184583 1.164885 1.01691 1.01691 Geom 1.166369 1.159134 1.006242 1.018793 Arith Ratios of the means Means of the ratios

10 Reliability 14000 12000 Equal means are not (always) equally useful. Two distributions, both with similar means. SPECfpRatio 10000 8000 6000 4000 GM = 2712 GStDev = 1.98 5362 Top distribution is less useful Bottom distribution which ever benchmark most resembles your job, the mean is a good measure. 2000 0 14000 12000 10000 8000 6000 4000 2000 0 wupwise swim mgrid applu mesa galgel art wupwise equake swim facerec mgrid ammp applu lucas mesa fma3d galgel art sixtrack apsi equake facerec ammp lucas fma3d sixtrack apsi 2712 1372 Top distribution, if your job looks like art or galgel significantly under estimated. Like the others overestimated. Beware Manufacturers can tune to the benchmark. Special compiler switches. 70% of SPEC programs were dropped from the next release as no longer useful. SPECfpRatio GM = 2086 GStDev = 1.40 2911 2086 1494

11 Spec2000 List of benchmarks. gzip compression wupwise Quantum Chromodynamics vpr FPGA circuit placement swim Shallow water model gcc GNU C compiler mgrid 3D potential field mcf Combinatorial opitmisation applu Elliptic PDE solver crafty Chess program mesa 3d Graphics parser Word Processor galgel CFD eon Visualisation art Image recognition perlbmk perl application equake Seismic wave propagation gap Group theory facerec Face recognition vortex OO database ammp Computational chemistry bzip2 Compression lucas Primality testing twolf Place and rote simulator fma3d Crash simulation sixtrack apsi HEP accelerator design Meteorology A number are easy to scale up gcc bigger programme simulations increase size or increase mesh density: sixtrack, wupwise, swim, mgrid, equake.

12 Spec2000 Calculating reliability If modules have exponentially distributed lifetimes. (actually look more U shape). Age of module does not affect failure probability. 1 power supply with a MMTF of 100,000. Dual power supply expected time to first failure? 50,000 hours. System failure is of course longer. But replacement is more frequent. More costly, more time consuming. Failure time for a system of 10 disks each with a MMTF of 1 million hours. A disk controller with a MMTF of ½ million hours and a power supply with a MMTF of 1 / 5 million hours. Power supply is 1/200,000, Controller is 1/500,000 Disk is 1/1,000,000 but ten of them Total 10*1/1,000,000 + 1/500,000 + 1/200,000 =17/1,000,000 MMTF = 1,000,000/17 = 58,800 hours Failure rate is sum of individual failure rates

13 Spec2000 Calculating reliability MTTR mean time to repair. Asking about reliability it is also important to ask how long does it take to fix a problem. Very unlikely but long break v. likely but minimal break. So probable time loss is probability of break*time to repair. Sum over all such incidents to get estimate of down time. Raid works because although MTTF is shorter than for high spec disks. MTTR can be zero.

14 Scaling Subtle problems Assume you want to run two jobs with equal computing requirements Each takes 6 hours on core A and 12 hours on core B Compare a chip with 1 A core, with two B cores. time is the same. Systems are equivalent? Correspond to current paradigm. Memory requirement doubles. Number of I/O channels to files and database channels doubles. I/O rate fixed but channel overhead Number of jobs simultaneously handled by scheduler. Beware