CS 152 Midterm 2 May 2, 2002 Bob Brodersen

CS 152 Midterm 2 May 2, 2002 Bob Brodersen Name Solutions Show your work if you want partial credit! Try all the problems, don t get stuck on one of them. Each one is worth 10 points. 1) 2) 3) 4) 5) 6) 7) 8)

Question 1: Delay, Capacitance and Energy W Compound gate Y W Equivalent compound block Y X Z X Z B Delay ->O (or B->O) for high to low transitions.1 ns Load at and B= 10 ff O Slope =.01 ns/ff Load at = 10 ff Use the graphs below for the delay characteristics of the inverter and NND gate. Co is the capacitance connected to the output. Delay ->O (or B->O) for low to high transitions.2 ns O Slope =.04 ns/ff C o C o a) Fill in this table for the composite block (W to Y) ssume the wiring load is negligible and X = 1. W-> Y Internal delay Load dependent delay Input load at W Low to High at Y High to Low at Y.2 ns.9 ns.04 ns/ff.01 ns/ff 20fF 20fF b) If W & X start high and go low at the same time how much energy does the logic in this compound gate use if the supply is at 3 volts? (assume no load at X and Y and do not include the energy used to drive this gate). ssume the only capacitance being switched is the input capacitance of each gate. CV 2 = (40fF of internal node capacitance) *3 2 = 360 femtojoules.

Question 2: CPI ssume a processor has a clock rate of 500 MHz and an ideal CPI (no memory misses) of 1.0. What is the effective CPI if a program with a mix of 50% arithmetic and logic, 30% load/stores and 20% control instructions is run, if 10% of the data memory operations and 1% of the instructions have a miss penalty of 50 cycles. Show the equation you used to get your answer. Base CPI + Data Mem misses+ Inst. Mem misses 1 + (.3)(.1)50 +.01(1)(50) = 3.0

Question 3: Pipelining hazards The = block after the registers is a comparator, assume its output is available to the controller. (a) How many branch delay slots does this datapath need? Explain why. One delay slot. By the time the branch is decoded and a decision is made by the comparator, there is already one instruction following it in the IF stage. (b) add $2, $1, $3 addiu $1, $2, 1234 beq $1, $0, label This code demonstrates that this datapath has a hazard problem. What is the hazard, what kind is it and what changes to the datapath are needed to eliminate it? Read after write hazard between the addiu and BEQ instructions. This doesn t work because there is no way to forward register $1 to the ID stage. Therefore, the beq will not get the proper value for register $1. It can be fixed by moving the forwarding muxes to the ID stage before the comparator or adding an extra set of forwarding muxes there.

Question 4: Tomasulo scheduling Functional Unit type: Loads 1 Integer 1 FP adder 3 FP multiplier 6 Cycles: Consider the single issue Tomasulo processor and program shown above. ssume there is an integer unit that can process all integer operations in one clock cycle. Enter into the table below the clock cycle of the issue, start of execution and when the result is posted to the CDB. lso fill in the entries in the table below which show the value in each FP register and what the entries are in the Register Result Status table at the clock cycle right after issue of each instruction. Register result status table FP Registers Issue Exec CDB F0 F2 F4 F6 F0 F2 F4 F6 ld1 1 2 3 ld1 mlt1 2 4 10 ld1 mult1 ld1: ld $f0, 0($r1) mlt1: multd $f4, $f0, $f2 ld2: ld $f6, 0($r2) mlt2: multd $f6, $f4, $f6 mlt3: multd $f2, $f4, $f6 sd: sd 0($r2), $f2 ld2 3 4 5 mult1 ld2 M ($r1) mlt2 4 11 17 mult1 mult2 M ($r1) mlt3 11 18 24 mult1 mult2 M ($r1) sd 12 25 N/ mult1 mult2 M ($r1) $r1* $f2 $r1* $f2 M ($r2) M ($r2) M ($r2) M ($r2)

Question 5: Logic delay clk1 Write clock 32-bits clock skew LU LU Op Control clk2 B Shift right WriteB LSB 64-bits Registers and B: - setup time = 3 ns - hold time = 2 ns - clock-to-q time = 3 ns - shift time = 1 ns LU: - shortest delay path = 1 ns - longest delay path = 6 ns ssume delays through the control logic are negligible. a) Suppose that Write and WriteB are always asserted. What is the maximum allowable skew between clk1 and clk2 and in which direction? clktoq + LU Skew > B hold 3 + 1 - Skew > 2 Skew < 2 ns with Clk1 being skewed to be before Clk2 b) Suppose that Write is asserted once to load a value into register and then the circuit is allowed to run for a number of cycles. While running, the control unit will alternately assert the Shift right signal and then the WriteB signal. What is the maximum clock frequency at which we can run this circuit? B clktoq + LU + B setup = 12ns ; f clk = 83 MHz

Question 6: Branch Prediction Suppose we have a deeply pipelined processor, for which we implement a branch-target buffer for the conditional branches, which are 15% of the instructions. ssume that the misprediction penalty is always 3 cycles and the buffer miss penalty is always 6 cycles. ssume 90% hit rate in the buffer and 75% accuracy of the buffer prediction. ssume a base CPI without branch stalls of 1. a) What is the CPI? Explain your answer in words as well as equations. 1 +prob of branch( miss in buffer + in buffer but miss prediction ) = 1 +.15 ( (.1)6 +.9 (.25) 3 ) = 1.19 b) What are the entries in a branch history table and how are they indexed? The entries are the bits to indicate if past branches were taken or not and it is indexed by the lower bits of the instruction address.

Question 7: Caches Cache C1 is direct-mapped, C2 is fully associative, and C3 is 2-way set associative. Each has 4, one-word blocks (4 total words). ssume that the miss penalty for each is 10 clock cycles. ssume that the caches are initially empty. Using word addresses, fill in the chart below whether each memory hits or misses and which block it would be in, for all of the caches. t the bottom of the chart, compute the hit rate and the total miss penalty. Use an LRU strategy for replacement when appropriate. Memory Cache 1 (direct) Cache 2 (assoc) Cache 3 (2 way Set ssoc) Reference H/M? Block #? H/M? Block #? H/M? Set Block #? 0 M 0 M 0 M 0 0 4 M 0 M 1 M 0 1 8 M 0 M 2 M 0 0 0 M 0 H 0 M 0 1 4 M 0 H 1 M 0 0 8 M 0 H 2 M 0 1 Hit rate 0 50% 0 Miss penalty 6 * 10 = 60 cycles 30 cycles 60 cycles

Question 8: Multicycle datapath Bus Bus B npc P C I R SX ZX Rs,Rt,Rd Register File D B S D MEM M The datapath above forms a multicycle processor which uses two time-multiplexed buses for communication rather than point-to-point connections and muxes. SX and ZX is the sign and zero extended immediate. a) For this datapath draw a FSM (with bubbles and arcs) for Fetch, Decode and the operations DDI and LW. Fetch Decode Fetch Dispatch DDI: S <- + SX LW: S<- + SX Mem <-S Execute Memory Regfile <-S Regfile <- M Writeback b) Fill out the microprogram table below to implement this FSM. Src and SrcB fields specify which signals will be assigned to Bus and BusB, Wrt and WrtB fields specify what components are receiving inputs from the busses. The SrcX and WrtX fields can be any one of the state registers(ir,, B, S, M), the register file (RegFile), or memory(mem). Sequence specifies the function of a jump counter (next for next instruction, fetch for go to 00 or dispatch). mddr Instruction Src SrcB LUOp Wrt WrtB Sequence 00 Fetch PC Mem IR Next 01 Decode Dispatch 02 DDI SX DD Next 03 S RegFile Fetch 04 LW SX DD Next 05 S Mem Next 06 M RegFile Fetch 07 08