ECM and E 2 CM performance under bursty traffic. Cyriel Minkenberg & Mitch Gusat IBM Research GmbH, Zurich April 26, 2007

Similar documents
QCN Transience and Equilibrium: Response and Stability. Abdul Kabbani, Rong Pan, Balaji Prabhakar and Mick Seaman

Viavi ONX Ingress Mitigation and Troubleshooting Field Use Case using Ingress Expert

Cost-Aware Live Migration of Services in the Cloud

Processor time 9 Used memory 9. Lost video frames 11 Storage buffer 11 Received rate 11

Pattern Smoothing for Compressed Video Transmission

sr c0 c3 sr c) Throttled outputs Figure F.1 Bridge design models

Frame Relay Congestion Control

An Efficient Implementation of Interactive Video-on-Demand

Advanced Return Path Alignment & Maintenance Using the 9581 SST R4

New Serial Link Simulation Process, 6 Gbps SAS Case Study

New DSP Family Traffic Control Plus Feature

5G New Radio Technology and Performance. Amitava Ghosh Nokia Bell Labs July 20 th, 2017

Updates for the Back-to-back Frame Benchmark

SAVE: An Algorithm for Smoothed Adaptive Video over Explicit Rate Networks

Increasing Capacity of Cellular WiMAX Networks by Interference Coordination

IP Telephony and Some Factors that Influence Speech Quality

WaveDevice Hardware Modules

Lehrstuhl für Informatik 4 Kommunikation und verteilte Systeme

An Open-source Software Toolkit for Professional Media over IP (ST 2110 and more) IEVGEN KOSTIUKEVYCH

The DataView PowerPad III Control Panel

On the Characterization of Distributed Virtual Environment Systems

FLIP-5: Only send data to each taskmanager once for broadcasts

WHITE PAPER. Comprehensive Node Analysis Assures Big Upstream Gains For DOCSIS 3.0 Channel Bonding

Impact Of ATM Traffic Shaping On MPEG-2 Video Quality*

T-BERD /MTS 5800 Network Tester Fiber Channel Layer 2 Traffic

MTurboComp. Overview. How to use the compressor. More advanced features. Edit screen. Easy screen vs. Edit screen

Modified Generalized Integrated Interleaved Codes for Local Erasure Recovery

OddCI: On-Demand Distributed Computing Infrastructure

Impacts on User Behavior. Carol Ansley, Sr. Director Advanced Architecture, ARRIS Scott Shupe, Sr. Systems Architect Video Strategy, ARRIS

Multimedia Communications. Video compression

FPGA Development for Radar, Radio-Astronomy and Communications

AUDIOVISUAL COMMUNICATION

Introduction. Packet Loss Recovery for Streaming Video. Introduction (2) Outline. Problem Description. Model (Outline)

Detail at scale in performance analysis

A Discrete Time Markov Chain Model for High Throughput Bidirectional Fano Decoders

Amon: Advanced Mesh-Like Optical NoC

Data Converters and DSPs Getting Closer to Sensors

Eye Doctor II Advanced Signal Integrity Tools

Multiple Recorders in CANape Version Application Note AN-AMC-1-112

for the Epson Stylus Pro 4000 User s Guide

A Novel Study on Data Rate by the Video Transmission for Teleoperated Road Vehicles

Scalability of delays in input queued switches

Timing Error Detection: An Adaptive Scheme To Combat Variability EE241 Final Report Nathan Narevsky and Richard Ott {nnarevsky,

Optimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015

NMR. picospin. Maintenance Guide

Design Project: Designing a Viterbi Decoder (PART I)

II. SYSTEM MODEL In a single cell, an access point and multiple wireless terminals are located. We only consider the downlink

FACSAria I Standard Operation Protocol Basic Operation

VISSIM TUTORIALS This document includes tutorials that provide help in using VISSIM to accomplish the six tasks listed in the table below.

Popularity-Aware Rate Allocation in Multi-View Video

Human Body Blockage - Guidelines for TGad MAC development

Transport Stream. 1 packet delay No delay. PCR-unaware scheme. AAL5 SDUs PCR PCR. PCR-aware scheme PCR PCR. Time

IBIS-AMI Post-Simulation Analysis

Multimedia Communications. Image and Video compression

Inter-sector Interference Mitigation Method in Triple-Sectored OFDMA Systems

Commissioning of Accelerators. Dr. Marc Munoz (with the help of R. Miyamoto, C. Plostinar and M. Eshraqi)

About this Manual. Support for Your Product

System Requirements SA0314 Spectrum analyzer:

Error Resilience for Compressed Sensing with Multiple-Channel Transmission

On the Rules of Low-Power Design

SWITCHED INFINITY: SUPPORTING AN INFINITE HD LINEUP WITH SDV

Processes for the Intersection

013-RD

74F273 Octal D-Type Flip-Flop

How to Setup Virtual Audio Cable (VAC) 4.0x with PowerSDR

Reference. TDS7000 Series Digital Phosphor Oscilloscopes

TransitHound Cellphone Detector User Manual Version 1.3

How to Optimize Ad-Detective

Click Here To Start Demo

Video Transmission. Thomas Wiegand: Digital Image Communication Video Transmission 1. Transmission of Hybrid Coded Video. Channel Encoder.

KNX 1-10V dimmer 4 channels

This is the author s version of a work that was submitted/accepted for publication in the following source:

Key Performance Metrics: Energy Efficiency & Functional Density of CMTS, CCAP, and Time Server Equipment

Cost Analysis of Serpentine Tape Data Placement Techniques in Support of Continuous Media Display

HOME GUARD USER MANUAL

Tutorial FITMASTER Tutorial

Guide to the 600 and 800 Note means enter key

IoT Software Platforms

Testing Report: Spectra Logic Verde and Milestone Husky 500A

An Adaptive Technique for Reducing Leakage and Dynamic Power in Register Files and Reorder Buffers

Packet Scheduling Algorithm for Wireless Video Streaming 1

Sample Analysis Design. Element2 - Basic Software Concepts (cont d)

Agilent 83437A Broadband Light Source Agilent 83438A Erbium ASE Source

Concise NFC Demo Guide using R&S Test Equipment Application Note

VISSIM Tutorial. Starting VISSIM and Opening a File CE 474 8/31/06

A CYCLES/MB H.264/AVC MOTION COMPENSATION ARCHITECTURE FOR QUAD-HD APPLICATIONS

Oculomatic Pro. Setup and User Guide. 4/19/ rev

Model-Based Mask Data Preparation (MB-MDP) and its impact on resist heating

New DSP Family CM Sweep Feature

Impact of Intermittent Faults on Nanocomputing Devices

Manual Addendum For Rerun V1.1 software 12/12/2006, RERUN-A = Serial #06A068, RERUN-P = Serial #06A031

Design of Fault Coverage Test Pattern Generator Using LFSR

Understanding FICON Channel Path Metrics

64G Fibre Channel strawman update. 6 th Dec 2016, rv1 Jonathan King, Finisar

SIMULATION MODELS. Machine 1

Obstacle Warning for Texting

Topic: Instructional David G. Thomas December 23, 2015

Techniques to Reduce Manufacturing Cost-of-Test of Optical Transmitters, Flex DCA Interface

Static Timing Analysis for Nanometer Designs

LogiCORE IP AXI Video Direct Memory Access v5.01.a

Transcription:

ECM and E 2 CM performance under bursty traffic Cyriel Minkenberg & Mitch Gusat IBM Research GmbH, Zurich April 26, 2007

Target Study Output-Generated (OG) single hop congestion with bursty injection processes Conditions, parameters, simulation environment Traffic Non-Pareto temporal source injection burstiness: i.i.d. bursty arrivals geometrically distributed burst size around mean B = [1.2, 12, 48, 120] us LL-FC: runs with and w/o PAUSE CM: none, ECM and E 2 CM Metrics: TP aggr, TP hot, Q hot, frame drops for details see the fine print page IBM Research GmbH, Zurich 2

Output-Generated Single-Hop Hotspot Node 2 85% N= 16 Core Switch Service rate = 10% 85% Node 1 Node N 85% All nodes: Uniform destination distribution, load = 85% (8.5 Gb/s) Node 1 service rate = 10% One congestion point Hotspot degree = N-1 All flows affected IBM Research GmbH, Zurich 3

Simulation Setup & Parameters The fine print Traffic I.i.d. Bursty arrivals, geometrically distributed burst size around mean B B = [1.2, 12, 48, 120] us Uniform destination distribution (to all nodes except self) Fixed frame size = 1500 B Scenario 1. Single-hop output-generated hotspot Switch N = 16 M = 300 KB/port Partitioned memory per input, shared among all outputs No limit on per-output memory usage PAUSE enabled or disabled Applied on a per input basis based on local high/low watermarks watermark high = 260 KB watermark low = 230 KB If disabled, frames dropped when input partition full Adapter Per-node virtual output queuing, round-robin scheduling No limit on number of rate limiters Ingress buffer size = 1500 KB, partitioned across VOQs, per-flow selective source quench used when VOQ full, round-robin VOQ service Egress buffer size = 150 KB PAUSE enabled watermark high = 150 rtt*bw KB watermark low = watermark high -10 KB ECM W = 2.0 Q eq = 75 KB (= M/4) G d = 0.5 / ((2*W+1)*Q eq ) G i0 = (R link / R unit ) * ((2*W+1)*Q eq ) G i = 0.1 * G i0 P sample = 2% (on average 1 sample every 75 KB R unit = R min = 1 Mb/s BCN_MAX enabled, threshold = 260 KB No BCN(0,0), no self-increase E 2 CM (per-flow) W = 2.0 Q eq,flow = 15 KB G d, flow = 0.5 / ((2*W+1)*Q eq,flow ) G i, flow = 0.005 * (R link / R unit ) / ((2*W+1)*Q eq,flow ) P sample = 2% (on average 1 sample every 75 KB) R unit = R min = 1 Mb/s BCN_MAX enabled, threshold = 52 KB IBM Research GmbH, Zurich 4

Aggregate throughput - PAUSE disabled Mean burst size = 1.2 us Mean burst size = 12 us Mean burst size = 48 us Mean burst size = 120 us IBM Research GmbH, Zurich 5

Aggregate throughput PAUSE enabled Mean burst size = 1.2 us Mean burst size = 12 us Mean burst size = 48 us Mean burst size = 120 us IBM Research GmbH, Zurich 6

Hot port throughput - PAUSE disabled Mean burst size = 1.2 us Mean burst size = 12 us Mean burst size = 48 us Mean burst size = 120 us IBM Research GmbH, Zurich 7

Hot port throughput PAUSE enabled Mean burst size = 1.2 us Mean burst size = 12 us Mean burst size = 48 us Mean burst size = 120 us IBM Research GmbH, Zurich 8

Hot queue length - PAUSE disabled Mean burst size = 1.2 us Mean burst size = 12 us Mean burst size = 48 us Mean burst size = 120 us IBM Research GmbH, Zurich 9

Hot queue length PAUSE enabled Mean burst size = 1.2 us Mean burst size = 12 us Mean burst size = 48 us Mean burst size = 120 us IBM Research GmbH, Zurich 10

Frame drops (PAUSE disabled) 10000000 Number of frames dropped 1000000 100000 10000 1000 1.2 12 48 120 100 No CM ECM E2CM Congestion Management Scheme IBM Research GmbH, Zurich 11

Conclusions to Bursty OG For high burstiness CM improves aggregate throughput even w/o hotspot (no PAUSE) Difficulty (of control) is proportional to 1/B As mean burst size increases Aggregate throughput recovers faster Queue stabilizes more quickly (1 st overshoot) Frame drops are fewer (w/o PAUSE) except a sweet-spot anomaly at b=48 for E2CM Future work: FCT metric Not trivial to generate standard workload and use standard measurements... Using trace-based simulation? IBM Research GmbH, Zurich 12

ECM and E 2 CM performance in large switch configurations Single-Hop High Degree Hotspot Cyriel Minkenberg & Mitch Gusat IBM Research GmbH, Zurich April 26, 2007

Targets 1. Study Output-Generated (OG) single-hop scenario with high hotspot degree (HSD) congestion 2. First look at E 2 CM with continuous probing (Pat s suggestion in sim adhoc call April 12 th ) Conditions, parameters, simulation environment Traffic i.i.d. Bernoulli arrivals LL-FC: runs with and w/o PAUSE CM: No CM, ECM, E 2 CM, E 2 CM-CP Metrics: TP aggr, TP hot, Q hot, frame drops for details see the fine print page IBM Research GmbH, Zurich 14

Output-Generated Single-Hop High HSD Node 2 85% N= {16,32,64,128,256} Core Switch Service rate = 10% 85% Node 1 Node N 85% All nodes: Uniform destination distribution, load = 85% (8.5 Gb/s) Node 1 service rate = 10% One congestion point Hotspot degree = N-1 All flows affected IBM Research GmbH, Zurich 15

Simulation Setup & Parameters (same as before) Traffic I.i.d. Bernoulli arrivals, geometrically distributed burst size around mean B Uniform destination distribution (to all nodes except self) Fixed frame size = 1500 B Scenario 1. Single-hop output-generated hotspot Switch Radix N = [16, 32, 64, 128, 256] M = 300 KB/port Partitioned memory per input, shared among all outputs No limit on per-output memory usage PAUSE enabled or disabled Applied on a per input basis based on local high/low watermarks watermark high = 260 KB watermark low = 230 KB If disabled, frames dropped when input partition full E 2 CM-CP = E 2 CM with continuous probing, i.e., probing is always active Adapter Per-node virtual output queuing, round-robin scheduling No limit on number of rate limiters Ingress buffer size = 1500 KB, partitioned across VOQs, per-flow selective source quench used when VOQ full, round-robin VOQ service Egress buffer size = 150 KB PAUSE enabled watermark high = 150 rtt*bw KB watermark low = watermark high -10 KB ECM W = 2.0 Q eq = 75 KB (= M/4) G d = 0.5 / ((2*W+1)*Q eq ) G i0 = (R link / R unit ) * ((2*W+1)*Q eq ) G i = 0.1 * G i0 P sample = 2% (on average 1 sample every 75 KB R unit = R min = 1 Mb/s BCN_MAX enabled, threshold = 260 KB No BCN(0,0), no self-increase E 2 CM (per-flow) W = 2.0 Q eq,flow = 15 KB G d, flow = 0.5 / ((2*W+1)*Q eq,flow ) G i, flow = 0.005 * (R link / R unit ) / ((2*W+1)*Q eq,flow ) P sample = 2% (on average 1 sample every 75 KB) R unit = R min = 1 Mb/s BCN_MAX enabled, threshold = 52 KB IBM Research GmbH, Zurich 16

Aggregate throughput - PAUSE disabled ECM E 2 CM E 2 CM-CP No CM IBM Research GmbH, Zurich 17

Aggregate throughput PAUSE enabled ECM E 2 CM E 2 CM-CP No CM IBM Research GmbH, Zurich 18

Hot port throughput - PAUSE disabled ECM E 2 CM E 2 CM-CP No CM IBM Research GmbH, Zurich 19

Hot port throughput PAUSE enabled ECM E 2 CM E 2 CM-CP No CM IBM Research GmbH, Zurich 20

Hot queue length - PAUSE disabled ECM E 2 CM E 2 CM-CP No CM IBM Research GmbH, Zurich 21

Hot queue length PAUSE enabled ECM E 2 CM E 2 CM-CP No CM IBM Research GmbH, Zurich 22

Frame drops (PAUSE disabled) 100000000 10000000 Number of frames dropped 1000000 100000 10000 1000 16 32 64 128 256 100 No CM ECM E2CM E2CM-CP Congestion Management Scheme IBM Research GmbH, Zurich 23

Simulation duration per run 450 400 Simulation duration (minutes) 350 300 250 200 150 100 50 0 16 32 64 128 256 Number of nodes Number of nodes doubles simulation time triples IBM Research GmbH, Zurich 24

Conclusions on High-HSD OG: A Corner Case? Recovery duration drastically increases with HSD With 256 nodes, recovery exceeds hotspot duration (400 ms) in all cases PAUSE makes no substantial difference, except that accumulated backlog for cold ports causes overshoot when used E 2 CM with continuous probing performs (for this scenario) better than both baselines Persistent high HSD requires parameter tuning Is this really a common case to be worried about or rather a corner case? Higher decrease gains? Currently also testing use of BCN(0,0), as BCN_MAX does not result in sufficiently fast throttling IBM Research GmbH, Zurich 25