# The Y86 Pipelined Datapath: Data and Control Hazards CSCI 237: Computer Organization 24<sup>th</sup> Lecture, Monday, November 4, 2024 **Kelly Shaw** Slides originally designed by Bryant and O'Hallaron @ CMU for use with Computer Systems: A Programmer's Perspective, Third Editio Τ. # Last Time: The Y86 Datapath - Construction a single-cycle datapath for Y86 - Pipelining Concepts ### Administrative Details - Lab #4 due **Thursday** at 11pm - Partner signup for lab 5 by Wednesday at noon - Read CSAPP Ch. 4.4-4.5 2 # Today: The Y86 Pipelined Datapath - Construction of a pipelined datapath for Y86 - Adding pipeline registers - Data hazards - Ways to deal with data hazards - Stalling - Data forwarding - Control hazards - Branch prediction Obstacles to speedup in Pipelining 1. Uneven Stages 2. Ideal cycle time w/out above limitations with n stage pipeline: Obstacles to speedup in Pipelining 1. Uneven Stages 2. Pipeline Register Delay Ideal cycle time w/out above limitations with n stage pipeline: Obstacles to speedup in Pipelining W D F 1. Uneven Stages 2. Pipeline Register Delay Ideal cycle time w/out above limitations with n stage pipeline: OldCycleTime / n Our goals for a better processor design: Faster clock rate Use machine more efficiently No longer execute only one instruction at a time ) Creating Stages to transform Seq Design IF ID MEM WB Fetch Decode Execute Memory WriteBack Fetch – get instruction Decode – read registers Execute – use ALU Memory – access memory WriteBack – write registers 11 Pipelined Datapath Fetch Decode Execute Memory Out Data Inst Mem Data Memory Data Memory In Data In Data Writeback) Pipeline Register OF 12 10 IF ID MEM WB mrmovq (%rax), %rsi addq %rbx, %rbx addq %rbx, %rbx mrmovq (%rax), %rsi rmmovq (%rax), %rsi rmmovq %rdi, (%r8) xor %r9, %r10 1 2 3 4 5 6 7 8 Time-> IF ID MEM WB xor %r9, %r10 rmmovq %rdi, (%r8) mrmovq (%rax), %rsi addq %rbx, %rbx mrmovq (%rax), %rsi rmmovq %rdi, (%r8) xor %r9, %r10 IF ID MEM WB IF ID MEM WB Time-> 1 2 3 4 5 6 7 8 18 17 IF ID MEM WB The machine in cycle 5 addq %rbx, %rbx mrmovq (%rax), %rsi rmmovq %rdi, (%r8) xor %r9, %r10 Time-> 22 In what cycle was %rsi written? 6 In what cycle was %r9 read? 5 In what cycle was the addq executed? 3 %rbx, %rbx mrmovq (%rax), %rsi rmmovq %rdi, (%r8) xor %r9, %r10 Time-> 26 # How Could We Solve this Problem? Compiler could add nop instructions before later instruction We can add circuitry to detect the problem and stall the second instruction Easy Right? Not so fast. In what cycle does the addq write %rsi? cycle 5 In what cycle does the xor read %rsi? cycle 3 addq %rdi, %rsi xor %rsi, %r8 rmmovq %r9, 0(%r9) andq %r8, %r10 IF ID MEM WB Time-> Ahhhh! Values can not pass backwards in time In what cycle does the xor read %rsi? cycle 3 Only Register File rd/wr in half a cycle. All other stages take a full cycle – this is Correc because of shared hardware Lasy Night: Not so N In what cycle does the addq write %rsi? 1st half of cycle 5 In what cycle does the xor read %rsi? 2<sup>nd</sup> half of cycle 5 Stall - wasted cycles addq %rdi, %rsi xor %rsi, %r8 rmmovq %r9, 0(%r9 andq %r8, %r10 Time-> 34 37 # Incorrect Execution caused by Data Hazard In what cycle does the mrmovq write %rsi? 1st half of cycle 5 In what cycle does the xor read %rsi? 2nd half of cycle 3 mrmovq 0(%r9), %rsi xor %rsi, %r8 rmmovq %r11, 0(%r10) andq %r12, %r13 Time-> 38 In what cycle does the mrmovq write %rsi? 1st half of 5 In what cycle does the xor read %rsi? 2nd half of 3 Arrow to the left is information passed backwards in time mrmovq 0(%r9), %rsi xor %rsi, %r8 rmmovq %r11, 0(%r10) andq %r12, %r13 Time-> 39 40 41 # Barriers to pipelined performance - Uneven stages - Pipeline register delays 42 # Barriers to pipeline performance - Uneven stages - Pipeline register delays - Data Hazards - An instruction depends on the result of a previous instruction still in the pipeline and that dependence has the potential to cause erroneous computation # Barriers to pipelined performance - Uneven stages - Pipeline register delays - Data Hazards 43 # Practice on Your Own ■ Consider the Y86-64 code below. Are there any potential problems due to data hazards in this code? mrmovq (%rdi), %r8 irmovq \$4, %r9 addq %r9, %r8 rmmovq %r8, (%rdi) ## Read After Write (RAW) Data Dependences - When a later instruction depends on the result of an earlier instruction - If instructions are close enough in pipeline, later instruction may need to be stalled to ensure correctness 46 RAW — Read after Write addq %r8, %rsi subq %rsi, %r9 xor %rax, %rax addq %rdi, %rdi 47 # In what cycle is \$rsi calculated in the machine? In what cycle is \$rsi used in the machine? mrmovq 0(%r9), %rsi xor %rsi, %r8 rmmovq %r11, 0(%r10) andq %r12, %r13 Time-> Solution 1: Data Forwarding In what cycle is %rsi calculated in the machine? End of cycle 4 In what cycle is %rsi used? mrmovq 0(%r9), %rsi xor %rsi, %r8 rmmovq %r11, 0(%r10) andq %r12, %r13 Time-> **Data-Forwarding** $\text{Mem} \to \text{Ex}$ Where are those wires? Fetch Decode **Execute** Memory src1 src1data Out Data src2data rB D Data Memory dest2 n Data dest2data (Writeback) Pipeline Register 55 56 **Data Forwarding** Example 2 Draw the timing diagram with data forwarding Draw arrows to indicate data passing through forwarding mrmovq 0(%r9), %r10 F D M $F \parallel D$ addq \$8, %r10 addq %r10, %r11 rmmovq %rsi, 0(%r11) 6 7 8 9 10 11 12 Time-> **Data Forwarding** Example 2 Draw the timing diagram with data forwarding Draw arrows to indicate data passing through forwarding mrmovq 0(%r9), %r10 addq \$8, %r10 %r10 D addq %r10, %r11 %r11 rmmovq %rsi, 0(%r11) 9 10 11 12 Time-> 57 ## **Data Forwarding Details** - Can forward - Memory to Execute - Value forwarded from Mem to next instruction's Ex stage - Value being forwarded from Mem may have been produced by Ex - Second instruction after instruction producing value in Ex needs value in its Ex stage - Execute to Execute - Value forwarded from Ex stage to next instruction's Ex stage **Data Forwarding Circuitry** - Info communicated via wires between stages: - From stage producing value - To each input consuming value - Register number being written by producer / consumed by reader - Value being written into register by producer - Circuitry at consuming values - Comparator for register being written by producer and register being read by consumer - Output of comparator feeds MUX selecting between consuming stage's pipeline register value and value forwarded from producing stage 59 ## **Handling Data Hazards** - Caused by some RAW dependences - Compiler can insert nops to delay later instruction - Detect and stall - Detect register written by earlier instruction (further in pipeline) will be read by later instruction (earlier in pipeline) before value written to register file - Prevent later instruction from completing decode stage until cycle register written to register file (writeback stage) - Data forwarding - Detect register written by earlier instruction (further in pipeline) will be read by later instruction (earlier in pipeline) before value written to register file - When value needed by later instruction (execute stage) determine if earlier instruction (further in pipeline) has produced value and can forward it to execute stage 60 61 ## Today: The Y86 Pipelined Datapath - Pipelining Concepts - Construction of a pipelined datapath for Y86 - Adding pipeline registers - Data hazards - Ways to deal with data hazards - Stalling - Data hazards - Control hazards 62 **Control Hazard** In what cycle does the nextPC get calculated for the jne? In what cycle does the xorq get fetched? 63 **Control Hazard** In what cycle does the nextPC get calculated for the jne? End of 4 In what cycle does the xorq get fetched? # Control Hazard In what cycle does the nextPC get calculated for the jne? End of 4 In what cycle does the xorq get fetched? Beginning of 3 cmpq %r12, %r13 jne end xorq %r8, %r9 end: addq %r11, %r10 Time-> 66 Control Hazard: Stall until target known In what cycle does the nextPC get calculated for the jne? End of 4 In what cycle does the xorq get fetched? Beginning of 3 cmpq %r12, %r13 jne end xorq %r8, %r9 end: addq %r11, %r10 Time-> 67