Scalability of delays in input queued switches

Similar documents
WPA REGIONAL CONGRESS OSAKA Japan 2015

Feasibility Study of Stochastic Streaming with 4K UHD Video Traces

Inter-sector Interference Mitigation Method in Triple-Sectored OFDMA Systems

Sitting through commercials: How commercial break timing and duration affect viewership

UC Berkeley UC Berkeley Previously Published Works

Dynamic bandwidth allocation scheme for multiple real-time VBR videos over ATM networks

Processes for the Intersection

An optimal broadcasting protocol for mobile video-on-demand

2550 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 54, NO. 6, JUNE 2008

II. SYSTEM MODEL In a single cell, an access point and multiple wireless terminals are located. We only consider the downlink

ECM and E 2 CM performance under bursty traffic. Cyriel Minkenberg & Mitch Gusat IBM Research GmbH, Zurich April 26, 2007

Sensors, Measurement systems Signal processing and Inverse problems Exercises

An Interactive Broadcasting Protocol for Video-on-Demand

A Discrete Time Markov Chain Model for High Throughput Bidirectional Fano Decoders

NEXTONE PLAYER: A MUSIC RECOMMENDATION SYSTEM BASED ON USER BEHAVIOR

HIGH-DIMENSIONAL CHANGEPOINT ESTIMATION

Decision-Maker Preference Modeling in Interactive Multiobjective Optimization

Cryptography CS 555. Topic 5: Pseudorandomness and Stream Ciphers. CS555 Spring 2012/Topic 5 1

S Queueing Theory. M/G/1-LIFO-PR queue. Samuli Aalto TKK Helsinki University of Technology

TIMBRE REPLACEMENT OF HARMONIC AND DRUM COMPONENTS FOR MUSIC AUDIO SIGNALS

Technical report on validation of error models for n.

Solution of Linear Systems

Randomness for Ergodic Measures

THE CAPABILITY of real-time transmission of video over

Retiming Sequential Circuits for Low Power

Beyond Worst Case Analysis in Approxima4on Uriel Feige The Weizmann Ins2tute

HIGH-DIMENSIONAL CHANGEPOINT DETECTION

Efficient Bandwidth Resource Allocation for Low-Delay Multiuser MPEG-4 Video Transmission

Math and Music. Cameron Franc

MATH 214 (NOTES) Math 214 Al Nosedal. Department of Mathematics Indiana University of Pennsylvania. MATH 214 (NOTES) p. 1/11

Faster randomized consensus with an oblivious adversary

Seamless Workload Adaptive Broadcast

Packet Scheduling Algorithm for Wireless Video Streaming 1

GNURadio Support for Real-time Video Streaming over a DSA Network

On the Optimal Compressions in the Compress-and-Forward Relay Schemes

Meeting Real-Time Constraint of Spectrum Management in TV Black-Space Access

Statistical Consulting Topics. RCBD with a covariate

Optimum Frame Synchronization for Preamble-less Packet Transmission of Turbo Codes

RANDOMIZED COMPLETE BLOCK DESIGN (RCBD) Probably the most used and useful of the experimental designs.

A New Family of High-Performance Parallel Decimal Multipliers*

Research Article. ISSN (Print) *Corresponding author Shireen Fathima

S Queueing Theory. M/G/1-PS queue. Samuli Aalto TKK Helsinki University of Technology. Lect_MG1-PS.ppt S Queueing Theory Fall 2009

Broadcast Networks with Arbitrary Channel Bit Rates

An Update Method for a Low Power CAM Emulator using an LUT Cascade Based on an EVMDD (k)

Performance of a Low-Complexity Turbo Decoder and its Implementation on a Low-Cost, 16-Bit Fixed-Point DSP

1360 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 59, NO. 3, MARCH Optimal Encoding for Discrete Degraded Broadcast Channels

Improving Bandwidth Efficiency on Video-on-Demand Servers y

ORTHOGONAL frequency division multiplexing

Minimax Disappointment Video Broadcasting

Linear mixed models and when implied assumptions not appropriate

Application-specific Workload Shaping in Multimedia-enabled Personal Mobile Devices

Real-Time Systems Dr. Rajib Mall Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

MDPs with Unawareness

CSC 373: Algorithm Design and Analysis Lecture 17

FORWARD AND RETRANSMITTED SYSTEMATIC LOSSY ERROR PROTECTION FOR IPTV VIDEO MULTICAST

High Speed Optical Networking: Task 3 FEC Coding, Channel Models, and Evaluations

CODING FOR CHANNELS WITH FEEDBACK

Chapter 12. Synchronous Circuits. Contents

Hidden Markov Model based dance recognition

IN 1968, Anderson [6] proposed a memory structure named

Reinforcement Learning-based Resource Allocation in Fog RAN for IoT with Heterogeneous Latency Requirements

Lehrstuhl für Informatik 4 Kommunikation und verteilte Systeme

Successive Cancellation Decoding of Single Parity-Check Product Codes

Research on sampling of vibration signals based on compressed sensing

Key- The key k for my cipher is a single number from 1-26 which is shared between the sender and the reciever.

Combining Pay-Per-View and Video-on-Demand Services

Two Enumerative Tidbits

WITH the deployment of broadband wireless networks,

Cluster Analysis of Internet Users Based on Hourly Traffic Utilization

Video Streaming Over Wireless Packet Networks: An Occupancy-Based Rate Adaptation Perspective

Of Buses and Bunching:

Implementation of a turbo codes test bed in the Simulink environment

Switching Q1. (1) Explain the major components of a telephone exchange and explain their limiting factors

Video Quality-Driven Buffer Dimensioning in MPSoC Platforms via Prioritized Frame Drops

A Dynamic Heuristic Broadcasting Protocol for Video-on-Demand

1022 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 4, APRIL 2010

Basic Information for MAT194F Calculus Engineering Science 2004

Television Stream Structuring with Program Guides

New Approach to Multi-Modal Multi-View Video Coding

IN a point-to-point communication system the outputs of a

Homework 2 Key-finding algorithm

Joint source-channel video coding for H.264 using FEC

A Video Frame Dropping Mechanism based on Audio Perception

QCN Transience and Equilibrium: Response and Stability. Abdul Kabbani, Rong Pan, Balaji Prabhakar and Mick Seaman

COSC3213W04 Exercise Set 2 - Solutions

Joint Optimization of Source-Channel Video Coding Using the H.264/AVC encoder and FEC Codes. Digital Signal and Image Processing Lab

PIPELINING: BRANCH AND MULTICYCLE INSTRUCTIONS

AUTOMATIC MUSIC COMPOSITION BASED ON COUNTERPOINT AND IMITATION USING STOCHASTIC MODELS

IP TV Bandwidth Demand: Multicast and Channel Surfing

NBER WORKING PAPER SERIES INFORMATION SPILLOVERS IN THE MARKET FOR RECORDED MUSIC. Ken Hendricks Alan Sorensen

A variable bandwidth broadcasting protocol for video-on-demand

Pattern Smoothing for Compressed Video Transmission

Canova Tech. IEEE 802.3cg Collision Detection Reliability in 10BASE-T1S March 6 th, 2019 PIERGIORGIO BERUTO ANTONIO ORZELLI

Separating Semantic and Circular Security for Symmetric Key Bit Encryption from LWE. Rishab Goyal Venkata Koppula Brent Waters

AN EVER increasing demand for wired and wireless

The Welfare Effects of Bundling in Multi-Channel Television Markets


Rate-Distortion Optimized Hybrid Error Control for Real-Time Packetized Video Transmission

Chord Representations for Probabilistic Models

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

Transcription:

Scalability of delays in input queued switches Paolo Giaccone Notes for the class on Router and Switch Architectures Politecnico di Torino December 2011

Scalability of delays N N switch Key question How does the average delay W scale with N, when N? Assumptions Bernoulli i.i.d. arrivals with rate λij [0, 1] cell/slot at input i for output j Admissible traffic: ρ (0, 1) and λ kj ρ j k λ ik ρ i k P. Giaccone (Politecnico di Torino) Delay and frame scheduling Dec. 2011 2 / 23

Output Queued (OQ) switch Assume uniform traffic: λ ij = ρ/n Delay of an OQ switch ( W OQ = 1 + 1 1 ) ρ N 2(1 ρ) 1 + ρ = O(1) (1) 2(1 ρ) Proof: each output queue is a slotted M/D/1 queue with binomial (N, ρ/n) arrivals per slot and service time equal to one slot P. Giaccone (Politecnico di Torino) Delay and frame scheduling Dec. 2011 3 / 23

Scheduling in Input Queued (IQ) switches Queue-independent policies Arrival rates are known Frame scheduling Random Frame scheduler Periodic Frame scheduler Queue-aware policies Arrival rates are unknown Slot-by-slot schedulers e.g.: MWM, islip Queue-aware frame scheduler P. Giaccone (Politecnico di Torino) Delay and frame scheduling Dec. 2011 4 / 23

Random Frame scheduling Assume λ ij = ρµ ij for some ρ < 1 where [µ ij ] is double-stochastic matrix: k µ ik = k µ kj = 1 Thanks to BvN Theorem, µ ij = k p kmij k with p k 0, k p k = 1 and M k = [Mij k ] be one matching matrix (1 k N!). At each timeslot, the scheduler selects M k at random with probability p k Delay of Random Frame (RF) scheduler W RF = N 1 1 ρ = O(N) P. Giaccone (Politecnico di Torino) Delay and frame scheduling Dec. 2011 5 / 23

Proof - I For a slotted Geom/Geom/1 queue with arrival probability λ and service probability µ, the average delay is W Geom/Geom/1 = η λ(1 η) with η = λ(1 µ) µ(1 λ) (2) In the random frame scheduler, VOQ ij is a Geom/Geom/1 queue with service probability 1 ( 1 pk M k ) ( ij 1 1 ) p k Mij k = p k Mij k = µ ij k k k and arrival probability λ ij = ρµ ij. P. Giaccone (Politecnico di Torino) Delay and frame scheduling Dec. 2011 6 / 23

Proof - II Now Recalling (2) The total arrival traffic is η = ρ 1 µ ij 1 ρµ ij 1 η = 1 ρ 1 ρµ ij W ij = 1 ρ (1 µ ij) (1 ρµ ij ) = 1 µ ij ρµ ij (1 ρµ ij ) (1 ρ) µ ij (1 ρ) and the overall average delay is λ tot = i,j λ ij = ρn W RF = 1 λ ij W ij = 1 λ ij (1 µ ij ) λ tot Nρ µ ij (1 ρ) = i,j i,j 1 Nρ i,j ρ(1 µ ij ) (1 ρ) = 1 Nρ i,j ρ λ ij 1 ρ = N2 ρ Nρ Nρ(1 ρ) = N 1 1 ρ P. Giaccone (Politecnico di Torino) Delay and frame scheduling Dec. 2011 7 / 23

Periodic Frame scheduling Assume uniform traffic: λ ij = ρ/n The scheduler serves each VOQ exactly every N timeslots fixed frame of N timeslots during timeslot t, input i is connected to output (i + t) mod N e.g. for N = 3: frame is (M1, M 2, M 3 ) 1 0 0 0 1 0 0 0 1 M 1 = 0 1 0 M 2 = 0 0 1 M 3 = 1 0 0 0 0 1 1 0 0 0 1 0 Delay of Periodic Frame (PF) scheduling W PF = 1 + N 2(1 ρ) = O(N) P. Giaccone (Politecnico di Torino) Delay and frame scheduling Dec. 2011 8 / 23

Proof - I Each VOQ is a slotted, single server queue with arrival probability ρ/n, service duration N slots and server vacation of N 1 slots. Now sample the state of the queue every N slots, in correspondence of each service opportunity. The sampling period is N slots. The VOQ appears as a slotted M/D/1 queue with binomial (N, ρ/n) arrivals and service equal to one sampling period. For such queue, we know (see (1)): ρ W M/D/1 1 + 2(1 ρ) P. Giaccone (Politecnico di Torino) Delay and frame scheduling Dec. 2011 9 / 23

Proof - II The average delay in the original VOQ will be N times the M/D/1 delay. In addition, the delay must include the average waiting time before being served and must be reduced since the service time is just 1 slot and not N slots as in the considered sub-sampled system. W PF = N 2 + NW M/D/1 (N 1) = N 2 + N + Nρ 2(1 ρ) N + 1 = N 2 + 1 + Nρ 1 + N 2 ( 1 + ρ 1 ρ 2(1 ρ) = ) = 1 + N 2 ( 1 1 ρ ) P. Giaccone (Politecnico di Torino) Delay and frame scheduling Dec. 2011 10 / 23

Queue-independent Frame Scheduling For the previous queue-independent frame schedulers W RF = O(N) W PF = O(N) General property: Delay of a Queue-Independent frame scheduler For any scheduling algorithm that operates independently of the queue size W queue independent = O(N) (3) proved in [1] M.J. Neely, E. Modiano, Y.S. Cheng, Logarithmic Delay for N N Packet Switches Under the Crossbar Constraint, IEEE Transaction on Networking, Vol. 15, N. 3, June 2007 by comparing with (3) with (1), queue-independent frame scheduling appears inefficient in terms of delays P. Giaccone (Politecnico di Torino) Delay and frame scheduling Dec. 2011 11 / 23

Generic Frame Scheduling Occupancy matrix L = [L ij ] where L ij is the size of VOQ ij BvN theorem T = max i=1,...,n is the minimum clearance time for L { N } N L ik, L ki k=1 k=1 Minimum clearance time and maximal size matchings Any arbitrary sequence of maximal size matchings will be able to serve all packets of L in 2T 1 timeslots. Proof: A given packet can be delayed by at most T 1 packets on the same input and by at most T 1 packets on the same output. In total, each packet can be delayed by at most 2T 2 other packets. P. Giaccone (Politecnico di Torino) Delay and frame scheduling Dec. 2011 12 / 23

Queue-aware frame scheduling Fix the frame duration T Let I k = [Tk,..., T (k + 1) 1] the slots corresponding to the kth frame At the beginning of kth frame, i.e. at slot Tk, the scheduler computes all the matchings for the future slots in I k based on just the arrivals in I k 1 Overflow packets are packets that arrived in I k 1 and were not served in I k We assume (for now) that overflow packets are dropped Key Idea 1 Choose T large enough to (almost) avoid overflow packets 2 Delays are O(T ) P. Giaccone (Politecnico di Torino) Delay and frame scheduling Dec. 2011 13 / 23

Queue-aware frame scheduling Let A(T ) = [a ij (T )] be the cumulative number of arrived packets during the kth frame I k By BvN theorem, it is possible to serve all the packets and avoid overflow packets iff k a ik T and k a kj T We will show that if T = θ(log(n)) Pr(frame overflow) can become negligible delays become O(log(N)) P. Giaccone (Politecnico di Torino) Delay and frame scheduling Dec. 2011 14 / 23

Chernoff Bound Theorem Let X 1, X 2,..., X n be independent binary random variables, with p i = Pr(X i = 1). Let X = n i=1 X i and µ = E[X ]. For any δ > 0: ( e δ ) µ Pr(X > (1 + δ)µ) < (1 + δ) (1+δ) (4) 1 P(X>(1+δ)µ) 0.01 0.0001 1e-06 µ=1 µ=10 µ=100 µ=1000 µ=10000 1e-08 1e-10 0 0.2 0.4 0.6 0.8 1 δ P. Giaccone (Politecnico di Torino) Delay and frame scheduling Dec. 2011 15 / 23

Proof from Wikipedia From http://en.wikipedia.org/wiki/chernoff_bound Pr[X > (1 + δ)µ)] inf t>0 = inf t>0 = inf t>0 E [ n i=1 exp(tx i)] exp(t(1 + δ)µ) n i=1 E[exp(tX i)] exp(t(1 + δ)µ) n i=1 [p i exp(t) + (1 p i )] exp(t(1 + δ)µ) The third line above follows because e tx i takes the value e t with probability p i and the value 1 with probability 1 p i. Rewriting p i e t + (1 p i ) as p i (e t 1) + 1 and recalling that 1 + x e x (with strict inequality if x > 0), we set x = p i (e t 1). P. Giaccone (Politecnico di Torino) Delay and frame scheduling Dec. 2011 16 / 23

Proof from Wikipedia Thus, n i=1 Pr[X > (1 + δ)µ] < exp(p i(e t 1)) exp(t(1 + δ)µ) = exp ((et 1) n i=1 p i) = exp((et 1)µ) exp(t(1 + δ)µ) exp(t(1 + δ)µ). If we simply set t = log(1 + δ) so that t > 0 for δ > 0, we can substitute and find exp((e t [ ] 1)µ) exp((1 + δ 1)µ) exp(δ) µ = exp(t(1 + δ)µ) (1 + δ) (1+δ)µ = (1 + δ) (1+δ) P. Giaccone (Politecnico di Torino) Delay and frame scheduling Dec. 2011 17 / 23

Minimum frame size to avoid overflow Frame size and overflow Let γ = ρe 1 ρ. If then Pr(frame overflow) ɛ. T log(n/ɛ) log(1/γ) 1 0.8 0.6 γ 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 ρ P. Giaccone (Politecnico di Torino) Delay and frame scheduling Dec. 2011 18 / 23

Minimum frame size to avoid overflow Frame size and overflow Let γ = ρe 1 ρ. If then Pr(frame overflow) ɛ. T log(n/ɛ) log(1/γ) 100000 10000 Minimum frame size for N=16 ε=0.1 ε=0.01 ε=0.001 ε=0.0001 100000 10000 Minimum frame size for N=1024 ε=0.1 ε=0.01 ε=0.001 ε=0.0001 1000 1000 T T 100 100 10 10 1 0 0.2 0.4 0.6 0.8 1 ρ 1 0 0.2 0.4 0.6 0.8 1 ρ P. Giaccone (Politecnico di Torino) Delay and frame scheduling Dec. 2011 19 / 23

Proof - I Consider a generic output j. Let C(T ) be the number of packets arrived during the frame and destined to j: C(T ) = N a ij (T ) i=1 Pr(overflow for output j) = Pr(C(T ) > T ) C(T ) = N i=1 T t=1 X it where X it = 1 with probability λ ij and all X it are independent random variables. We can use Chernoff Bound: µ = E[C(T )] = N i=1 t=1 having defined ρ = N i=1 λ ij ρ. T E[X it ] = T N λ ij = T ρ i=1 P. Giaccone (Politecnico di Torino) Delay and frame scheduling Dec. 2011 20 / 23

Proof - II Using (4) Pr(C(T ) > T ) = Pr (C(T ) > µρ ) ( e δ < (1 + δ) (1+δ) being 1 + δ = 1/ρ and δ = 1/ρ 1 ( e 1/ρ 1 Pr(C(T ) > T ) < (1/ρ ) 1/ρ ) ρ T since function γ is increasing with respect to ρ. By the union bound: Pr(overflow for any output) j = ( ) µ ) T e 1 ρ 1/ρ = (ρ e 1 ρ ) T γ T Pr(overflow for output j) Nγ T Now we can set Nγ T ɛ and obtain T log(γ) log(ɛ/n) T log(n/ɛ) log(1/γ) P. Giaccone (Politecnico di Torino) Delay and frame scheduling Dec. 2011 21 / 23

Queue-Aware Frame scheduling Assume ɛ enough small to experience negligible frame overflows. Then all packets are served with a delay 2T. Delay for Queue-Aware Frame scheduling W QAF 2 log(n/ɛ) log(1/γ) = O(log N) Note that the [1] proves formally the property for the average delay W QAF by considering also the delays for the overflow packets, which are not dropped as assumed here. P. Giaccone (Politecnico di Torino) Delay and frame scheduling Dec. 2011 22 / 23

Conclusions Take home messages Output-Queued switch W OQ = O(1) Queue-Independent Frame scheduler W QIF = O(N) Random Frame scheduler W RF = O(N) Periodic Frame scheduler W PF = O(N) Queue-Aware Frame scheduler W QAF = O(log N) P. Giaccone (Politecnico di Torino) Delay and frame scheduling Dec. 2011 23 / 23