Compter and Digital Sytem Architectre EE/CpE-517-A Brce McNair mcnair@teven.ed Steven Intitte of Technology - All right reerved 4-1/65
Week 4 ARM organization and implementation Frer Ch. 4 Steven Intitte of Technology - All right reerved 4-2/65
Evoltion of ARM Implementation Device Timeframe Technology Featre Acorn integer proceor 1983 1985 3000 nm 3 tage pipeline ARM6 ARM7 1990 1995 300 nm ame Newer ARM core >1995 < 300 nm 5 tage pipeline, eparate intrction/data cache Steven Intitte of Technology - All right reerved 4-3/65
3-tage pipeline ARM organization A[31:0] control Regiter ank: addre regiter Write port P C incrementer 2 read port regiter ank PC r15 read port intrction Addre increment r15 write port A L U A mltiply regiter arrel hifter B decode & control ALU data ot regiter data in regiter D[31:0] Steven Intitte of Technology - All right reerved 4-4/65
3-tage pipeline ARM organization A[31:0] control Regiter ank: addre regiter Write port P C incrementer 2 read port regiter ank PC r15 read port intrction Addre increment r15 write port A L U A mltiply regiter arrel hifter B decode & control ALU data ot regiter data in regiter D[31:0] Steven Intitte of Technology - All right reerved 4-5/65
3-tage pipeline ARM organization A[31:0] control addre regiter P C incrementer regiter ank PC intrction decode Barrel hifter: Shift or rotate y n-it A L U A mltiply regiter arrel hifter B & control ALU data ot regiter data in regiter D[31:0] Steven Intitte of Technology - All right reerved 4-6/65
3-tage pipeline ARM organization A[31:0] control addre regiter P C incrementer regiter ank PC intrction decode A L U A mltiply regiter arrel hifter B & control Arithmetic/Logic Unit (ALU): ALU data ot regiter data in regiter D[31:0] Steven Intitte of Technology - All right reerved 4-7/65
3-tage pipeline ARM organization A[31:0] control Addre regiter Select and hold memory addree Incrementer Imcrement addre a needed addre regiter P C regiter ank incrementer PC intrction decode A L U A mltiply regiter arrel hifter B & control ALU data ot regiter data in regiter D[31:0] Steven Intitte of Technology - All right reerved 4-8/65
3-tage pipeline ARM organization A[31:0] control addre regiter P C incrementer regiter ank PC intrction decode A L U A mltiply regiter arrel hifter B & control ALU Data regiter data ot regiter data in regiter D[31:0] Steven Intitte of Technology - All right reerved 4-9/65
3-tage pipeline ARM organization A[31:0] control Intrction decode and control logic addre regiter P C incrementer regiter ank PC intrction decode A L U A mltiply regiter arrel hifter B & control ALU data ot regiter data in regiter D[31:0] Steven Intitte of Technology - All right reerved 4-10/65
3-tage pipeline ARM organization A[31:0] control addre regiter Internal data e Define all poile internal data tranfer P C regiter ank incrementer PC intrction decode A L U A mltiply regiter arrel hifter B & control ALU data ot regiter data in regiter D[31:0] Steven Intitte of Technology - All right reerved 4-11/65
ARM ingle-cycle intrction pipeline operation 1 fetch decode execte 2 fetch decode execte 3 intrction fetch decode execte time Steven Intitte of Technology - All right reerved 4-12/65
ARM ingle-cycle intrction pipeline operation A[31:0] control addre regiter 1 fetch decode execte The image cannot e diplayed. Yor compter may not have enogh memory to open the image, or the image may have een corrpted. Retart yor compter, and then open the file again. If the red x till appear, yo may have to delete the image and then inert it again. P C incrementer Crrent intrction memory addre aerted regiter ank PC Intrction read from memory Intrction placed in pipeline A L U A mltiply regiter arrel hifter B intrction decode & control ALU data ot regiter data in regiter D[31:0] Steven Intitte of Technology - All right reerved 4-13/65
ARM ingle-cycle intrction pipeline operation A[31:0] control addre regiter 1 fetch decode execte The image cannot e diplayed. Yor compter may not have enogh memory to open the image, or the image may have een corrpted. Retart yor compter, and then open the file again. If the red x till appear, yo may have to delete the image and then inert it again. P C incrementer Intrction decoded regiter ank PC Datapath ignal prepared for intrction A L U A mltiply regiter arrel hifter B intrction decode & control ALU data ot regiter data in regiter D[31:0] Steven Intitte of Technology - All right reerved 4-14/65
ARM ingle-cycle intrction pipeline operation A[31:0] control addre regiter 1 fetch decode execte The image cannot e diplayed. Yor compter may not have enogh memory to open the image, or the image may have een corrpted. Retart yor compter, and then open the file again. If the red x till appear, yo may have to delete the image and then inert it again. P C incrementer Intrction exected regiter ank PC Datapath i owned y intrction Regiter are read, operand hifted, ALU procee data, relt written to regiter A L U A mltiply regiter arrel hifter B intrction decode & control ALU data ot regiter data in regiter D[31:0] Steven Intitte of Technology - All right reerved 4-15/65
ARM ingle-cycle intrction pipeline operation A[31:0] control 1 fetch decode execte addre regiter 2 fetch decode execte P C incrementer 3 intrction fetch decode execte time A L U A regiter ank mltiply regiter arrel hifter PC B intrction decode & control ALU data ot regiter data in regiter D[31:0] Steven Intitte of Technology - All right reerved 4-16/65
ARM mlti-cycle intrction pipeline operation 1 fetch ADD decode execte Single cycle ADD 2 fetch STR decode calc. addr. data xfer Mlti cycle STR 3 fetch ADD decode execte 4 fetch ADD decode execte 5 fetch ADD decode execte intrction time Steven Intitte of Technology - All right reerved 4-17/65
ARM mlti-cycle intrction pipeline operation 1 fetch ADD decode execte Single cycle ADD 2 fetch STR decode calc. addr. data xfer Mlti cycle STR 3 fetch ADD decode execte 4 fetch ADD decode execte 5 fetch ADD decode execte intrction time Steven Intitte of Technology - All right reerved 4-18/65
ARM mlti-cycle intrction pipeline operation 1 fetch ADD decode execte Single cycle ADD 2 fetch STR decode calc. addr. data xfer Mlti cycle STR 3 fetch ADD decode execte 4 fetch ADD decode execte 5 fetch ADD decode execte intrction time Steven Intitte of Technology - All right reerved 4-19/65
ARM mlti-cycle intrction pipeline operation 1 fetch ADD decode execte Single cycle ADD 2 fetch STR decode calc. addr. data xfer Mlti cycle STR 3 fetch ADD decode execte 4 fetch ADD decode execte 5 fetch ADD decode execte intrction time Steven Intitte of Technology - All right reerved 4-20/65
ARM mlti-cycle intrction pipeline operation 1 fetch ADD decode execte 2 fetch STR decode calc. addr. data xfer 3 fetch ADD decode execte 4 fetch ADD decode execte 5 fetch ADD decode execte intrction time Bottleneck: Memory acce i alway potential limitation of pipeline efficiency All intrction reqire at leat one cycle acce to datapath. Maye more Ue of datapath in any cycle prevent commitment of datapath in preceding decode Branch intrction cae pipeline to e flhed Steven Intitte of Technology - All right reerved 4-21/65
Time to execte a program T prog = N int CPI f clk N int = nmer of intrction CPI = average cycle per intrction f clk = clock peed of proceor Steven Intitte of Technology - All right reerved 4-22/65
Time to execte a program T prog = N int CPI f clk N int = nmer of intrction Write efficient code CPI = average cycle per intrction f clk = clock peed of proceor Steven Intitte of Technology - All right reerved 4-23/65
Time to execte a program T prog = N int CPI f clk N int = nmer of intrction CPI = average cycle per intrction f clk = clock peed of proceor Reimplement intrction that reqire more than one pipeline lot and/or redce pipeline tall caed y intrction dependencie Steven Intitte of Technology - All right reerved 4-24/65
Time to execte a program T prog = N int CPI f clk N int = nmer of intrction CPI = average cycle per intrction f clk = clock peed of proceor Simplify logic in each pipeline tage reqiring more pipeline tage Steven Intitte of Technology - All right reerved 4-25/65
Memory ottleneck in load-tore architectre Baic load-tore architectre Load data from memory Proceor intrction cycle Memory Store relt to memory Steven Intitte of Technology - All right reerved 4-26/65
Memory ottleneck in load-tore architectre Actal load-tore ytem tranfer Load intrction from memory Load data from memory Proceor intrction cycle Memory Store relt to memory Steven Intitte of Technology - All right reerved 4-27/65
Memory ottleneck in load-tore architectre Practical implementation Fetch intrction Load data Proceor intrction cycle Single memory Memory Store relt Steven Intitte of Technology - All right reerved 4-28/65
Memory ottleneck in load-tore architectre Improving performance Fetch intrction Load data program Intrction Memory Load data Proceor intrction cycle Data Data Memory Store relt Steven Intitte of Technology - All right reerved 4-29/65
Memory ottleneck in load-tore architectre Alternative way of mproving performance Fetch intrction Load data program Intrction Cache Load data Shared Memory Proceor intrction cycle Data Data Cache Store relt Steven Intitte of Technology - All right reerved 4-30/65
Memory ottleneck in load-tore architectre Alternative way of mproving performance Fetch intrction Load data program Intrction Cache Load data Shared Memory Proceor intrction cycle Data Data Cache Store relt When hold intrction (or data) e fetched from memory to cache? When hold data e written ack? Steven Intitte of Technology - All right reerved 4-31/65
5 tage pipeline Fetch Decode Execte Bffer/data Write ack Steven Intitte of Technology - All right reerved 4-32/65
5 tage pipeline contrated with 3-tage Regiter Fetch Decode Execte Bffer/data Write ack Memory Fetch Decode Execte Regiter Steven Intitte of Technology - All right reerved 4-33/65
ARM9TDMI 5-tage pipeline organization FETCH next pc +4 I-cache fetch pc + 4 DECODE pc + 8 r15 I decode regiter read intrction decode immediate field EXECUTE B, BL MOV pc SUBS pc +4 LDM/ STM mx potindex pre-index ml ALU hift reg hift forwarding path execte BUFFER/ DATA load/tore addre yte repl. D-cache ffer/ data WRITE- BACK LDR pc rot/gn ex regiter write write-ack Steven Intitte of Technology - All right reerved 4-34/65
Data forwarding in 5-tage pipeline Regiter Fetch Decode Execte Bffer/data Write ack Memory Fetch Decode Execte Regiter Steven Intitte of Technology - All right reerved 4-35/65
Data forwarding in 5-tage pipeline Regiter Fetch Decode Execte Bffer/data Write ack Memory Exection Fetch Decode Execte Regiter Steven Intitte of Technology - All right reerved 4-36/65
Data forwarding in 5-tage pipeline Regiter Fetch Decode Execte Bffer/data Write ack Memory Fetch Decode Execte Exection Data forwarded to eqent proceing tage to avoid pipeline tall Regiter Steven Intitte of Technology - All right reerved 4-37/65
Data forwarding in 5-tage pipeline LDR rn, [XYZZY] ; load rn XYZZY ADD r2, r1, rn ; r2 r1+rn rn i needed immediately, t thi cae pipeline tall Stall doen t occr in 3 tage pipeline 5 tage tall can e prevented y reordering exection Steven Intitte of Technology - All right reerved 4-38/65
Data proceing intrction PC Regiter Regiter Immediate Vale (8 it) Operand 1 Operand 2 Increment 4 yte Barrel hifter ALU fnction New PC Detination Regiter Condition code Steven Intitte of Technology - All right reerved 4-39/65
Data proceing intrction datapath activity addre reg iter addre reg iter increment increment Rd PC regiter Rn Rm Rd regiter Rn PC mlt mlt a in. a in. a intrction a intrction [7:0] data ot data in i. pipe data ot data in i. pipe (a) regiter - regiter operation () regiter - immediate operation Steven Intitte of Technology - All right reerved 4-40/65
Data tranfer intrction PC Bae Addre Regiter Regiter Immediate Vale (12 it) Operand 1 Operand 2 Increment 4 yte Offet ADD New PC Addre Regiter Condition code Steven Intitte of Technology - All right reerved 4-41/65
STR (tore regiter) datapath activity addre reg iter addre regiter increment increment regiter Rn PC Rn PC regiter Rd mlt mlt ll #0 hifter = A / A + B / A - B [11:0] = A + B / A - B data ot data in i. pipe yte? data in i. pipe (a) 1t cycle - compte addre () 2nd cycle - tore data & ato-index Steven Intitte of Technology - All right reerved 4-42/65
Branch intrction PC Immediate Vale (24 it) Operand 1 Operand 2 Left hift 2 it ADD New PC Condition code Steven Intitte of Technology - All right reerved 4-43/65
The firt two (of three) cycle of a ranch intrction addre re giter addre reg iter regiter PC increment mlt ll #2 increment R1 4 regiter PC mlt hi fter = A + B = A [23:0] data ot data in i. pipe data ot data in i. pipe (a) 1t cycle - compte ra nch targe t () 2nd cycle - ave retrn addre Steven Intitte of Technology - All right reerved 4-44/65
Clocking cheme D Q Edge enitive FF D Q Level enitive FF Steven Intitte of Technology - All right reerved 4-45/65
2-phae non-overlapping clock cheme phae 1 1 clock cycle phae 2 Steven Intitte of Technology - All right reerved 4-46/65
ARM datapath timing phae 1 ALU operand latched regiter read time hift time read valid hift ot valid precharge invalidate e phae 2 regiter write time ALU time ALU ot Steven Intitte of Technology - All right reerved 4-47/65
ARM datapath timing Minimm cycle time phae 1 ALU operand latched regiter read time hift time read valid hift ot valid precharge invalidate e phae 2 regiter write time ALU time ALU ot reg read time Shifter delay ALU proceing time (intrction dependent) Reg write Steven Intitte of Technology - All right reerved 4-48/65 f1 f2
The original ARM1 ripple-carry adder circit Cot A B 1-it carry delay O[N] delay m Cin C IN0 C IN1 C IN2 Accmlating delay Steven Intitte of Technology - All right reerved 4-49/65
The ARM2 4-it carry look-ahead cheme Cot[3] 4-it carry delay O[N/4] delay A[3:0] B[3:0] G P 4-it adder logic m[3:0] Cin[0] Steven Intitte of Technology - All right reerved 4-50/65
The ARM2 ALU logic for one relt it f: NB 5 0 1 2 3 carry logic G 4 NA P ALU Steven Intitte of Technology - All right reerved 4-51/65
ARM2 ALU fnction code f5 f4 f3 f2 f1 f0 ALU otpt 0 0 0 1 0 0 A and B 0 0 1 0 0 0 A and not B 0 0 1 0 0 1 A xor B 0 1 1 0 0 1 A pl not B pl carry 0 1 0 1 1 0 A pl B pl carry 1 1 0 1 1 0 not A pl B pl carry 0 0 0 0 0 0 A 0 0 0 0 0 1 A or B 0 0 0 1 0 1 B 0 0 1 0 1 0 not B 0 0 1 1 0 0 zero Steven Intitte of Technology - All right reerved 4-52/65
ARM6 ALU trctre Arithmetic Unit Logic Unit MUX Steven Intitte of Technology - All right reerved 4-53/65
The ARM6 carry-elect adder cheme a,[3:0] a,[31:28] + +, +1 c +1 mx +, +1 O[log 2 (N)] delay mx mx m[3:0] m[7:4] m[15:8] m[31:16] Steven Intitte of Technology - All right reerved 4-54/65
The ARM6 ALU organization A operand latch B operand latch invert A XOR gate XOR gate invert B fnction logic fnction adder C in C V logic/arithmetic relt mx zero detect relt N Z Steven Intitte of Technology - All right reerved 4-55/65
Barrel hifter D[0:N-1] Right/ Left Arithmetic/ Logical/ Rotate Barrel hifter n hift Q[0:N-1] Steven Intitte of Technology - All right reerved 4-56/65
Barrel hifter D[0:N-1] Right/ Left Arithmetic/ Logical/ Rotate Barrel hifter n hift 1 1 1 0 0 1 0 0 Q[0:N-1] 0 0 1 1 1 0 0 1 0 Steven Intitte of Technology - All right reerved 4-57/65
Barrel hifter D[0:N-1] Right/ Left Arithmetic/ Logical/ Rotate Barrel hifter n hift 1 1 1 0 0 1 0 0 Q[0:N-1] ign 1 1 1 1 0 0 1 0 Steven Intitte of Technology - All right reerved 4-58/65
Barrel hifter D[0:N-1] Right/ Left Arithmetic/ Logical/ Rotate Barrel hifter n hift 1 1 1 0 0 1 0 0 Q[0:N-1] 0 1 1 1 0 0 1 0 Steven Intitte of Technology - All right reerved 4-59/65
The cro-ar witch arrel hifter principle right 3 right 2 right 1 no hift D C B A in[3] in[2] in[1] in[0] left 1 left 2 left 3 ot[0] ot[1] ot[2] ot[3] Steven Intitte of Technology - All right reerved 4-60/65
The cro-ar witch arrel hifter principle right 3 right 2 right 1 no hift 4 it hifter: Rotate Left D C in[3] in[2] left 1 left 2 B in[1] left 3 A in[0] ot[0] ot[1] ot[2] ot[3] D A B C Steven Intitte of Technology - All right reerved 4-61/65
Regiter ank Mater-Slave D-FF deign reqire 4 2-inpt NAND 2-inpt NAND reqire 4 tranitor One Mater-Slave D-FF reqire 16 tranitor/it 32-it regiter reqire 512 tranitor 16 regiter reqire 8096 tranitor. Thi i too many tranitor for a 35000 tranitor proceor Steven Intitte of Technology - All right reerved 4-62/65
ARM2 regiter cell circit ALU A B write read A read B 7 tranitor/it deign (D-FF deign didn t inclde I/o enale, either) Steven Intitte of Technology - All right reerved 4-63/65
ARM coproceor interface Memory Addre cpi ARM proceor cpa cp ARM coproceor ARM coproceor Coproceor Private regiter ank (p to 16) may e >32 it Up to 16 coproceor cpi = coproceor intrction ha een detected and ARM wihe it to e exected cpa = coproceor aent cp = coproceor y Steven Intitte of Technology - All right reerved 4-64/65
ARM coproceor handhaking Fetch intrction No Ye Co-proc int? Ye COND? Ye Branch? No Aert cpi No Decode intrction contine cpa aerted? No cp aerted? No Coproceor take intr Execte intrction Ye Ye Trap error By wait handle complete Steven Intitte of Technology - All right reerved 4-65/65