# **Sequential Y86-64 Implementations**

CSCI 237: Computer Organization 16<sup>th</sup> Lecture, Mar 17, 2025

Jeannie Albrecht



Slides originally designed by Bryant and O'Hallaron @ CMU for use with Computer Systems: A Programmer's Perspective, Third Edition

### **Administrative Details**

Lab 3

- Due this week (submit using submit237 3 files)
- Midterm details
  - This week during lab; closed book/notes; you can have a calculator
  - Focuses on Chapters 1-4 and Labs 1, 2, and 3
    - Emphasis on Ch 2-3, Labs 1-2, a little bit on Y86-64 (Lab 3) and Ch 4
  - Sample exams posted on webpage
- No class on Wed
  - I'll be in my office from 9:30-12 if you have questions
- Office hours: today 2:00-3:00, tomorrow 1:30-3:00
- Review session Tue 8pm in TPL 205
- We have class on Friday (but no office hours in the afternoon)
  - I will post the slides (as always) if you will miss

- cmpq computes the difference between two integer operands and updates the OF, SF, ZF, and CF flags according to the result
  - cmpq ra, rb computes rb-ra and sets flags (condition codes)
- Conditional jump or move happens based on condition codes
- Example: jle jumps if ZF or (OF xor SF)
  - ZF handles the equals case
  - OF xor SF being set indicates that rb was less than ra in previous operation
  - So it jumps if rb <= ra</p>
- In general,
  - cmpq ra, rb
  - jOP jumps if rb OP ra
- In Y86, subX is the same as cmpX, andX is the same as testX

- cmpq ra, rb
- jOP jumps if rb OP ra

```
int test(long x, long y) {
                             test:
 if (x < y)
                               cmpq %rsi, %rdi # y:x
                               jge
                                      .L3
   return 0;
                                                   # x>=y
 else
                               movl $0, %eax
   return 1;
                               ret
}
                                                   # else
                             .L3:
                                       $1, %eax
                               movl
                               ret
```

- cmpq ra, rb
- jOP jumps if rb OP ra

```
int test(long x, long y) {
                             test:
 if (x \ge y)
                               cmpq %rsi, %rdi # y:x
   return 0;
                               jl .L3
                                                  # x<y
                               movl $0, %eax
 else
   return 1;
                               ret
}
                                                  # else
                             .L3:
                               movl $1, %eax
                               ret
```

- cmpq ra, rb
- jOP jumps if rb OP ra

| <pre>int test(long x) {</pre> | test: |           |        |
|-------------------------------|-------|-----------|--------|
| if $(x > 5)$                  | cmpq  | \$5, %rdi | # 5:x  |
| return 0;                     | jle   | .L3       | # x<=5 |
| else                          | movl  | \$0, %eax |        |
| return 1;                     | ret   |           |        |
| }                             | .L3:  |           | # else |
|                               | movl  | \$1, %eax |        |

ret

# Last Time

- Discussed combinational circuits
- Learned about HCL
- Overview of memory and clocking

### **Recap: HCL Summary**

- Book introduces a very simple hardware description language
- Can only express limited aspects of hardware operation
- Data Types
  - bool: Boolean
    - a, b, c, ...
  - int:words
    - A, B, C, ...
    - Does not specify word size—64-bit words, ...

#### Statements

- bool a = bool-expr ;
- int A = int-expr ;

### **Recap: HCL Operations**

- Classify by type of value returned
- Boolean Expressions
  - Logic Operations
    - a && b, a || b, !a
  - Word Comparisons
    - A == B, A ! = B, A < B, A < = B, A > = B, A > B
  - Set Membership
    - A in { B, C, D }
      - Same as A == B | | A == C | | A == D
- Word Expressions
  - Case expressions
    - [ a : A; b : B; c : C ]
    - Evaluate test expressions a, b, c, ... in sequence
    - Return word expression A, B, C, ... for first successful test

# Today

#### Finish up Ch 4.2

#### Move on to sequential Y86-64 implementations (Ch 4.3)

- Organizing Processes into stages
- SEQ Hardware Structure
- SEQ Stage Implementations
- SEQ Timing

# Recap: Hardware Registers



- Stores word of data
- Different from program registers seen in assembly code (e.g., %rdi)
- Collection of edge-triggered latches
- Loads input on *rising edge* of clock

#### **Random-Access Memory**



- Stores multiple words of memory (has internal storage)
  - Address input specifies which word to read or write
- Register file
  - Holds values of program registers (%rax, %rsp, etc.)
  - Register identifier serves as address
    - ID 15 (0xF) implies no read or write performed
- Multiple Ports
  - Can read and/or write multiple words in one cycle
    - Each has separate address and data input/output

## **Register File Timing**



#### Reading

- Like combinational logic
- Output data generated based on input addr
  - After some small delay
- Writing
  - Like hardware register wrt timing
  - Update only as clock rises



# Ch 4.2 Summary

- Computation
  - Performed by combinational logic
  - Computes Boolean functions
  - Continuously reacts to input changes
- Storage
  - Registers (hardware)
    - Hold single words
    - Loaded as clock rises
  - Random-access memories
    - Hold multiple words
    - Possible multiple read or write ports
    - Read word when address input changes
    - Write word as clock rises





# **SEQ Stages**

#### Fetch

- Read instruction from instr memory
- Decode
  - Read program registers from instr
- Execute
  - Compute value or address
- Memory
  - Read or write data
- Write Back
  - Write program registers to register file
- PC update
  - Update program counter



### **Instruction Decoding**



icode:ifun

rA:rB

#### Instruction Format

- Instruction byte
- Optional register byte
- Optional constant word valC

# **Executing Arithmetic/Logical Operation**

OPq rA, rB 6 fn rA rB

#### Fetch

- Read 2 bytes
- Decode
  - Read operand registers
- Execute
  - Perform operation
  - Set condition codes

- Memory
  - Do nothing
- Write back
  - Update register rB
- PC Update
  - Increment PC by 2 bytes

# Stage Computation: Arith/Log Ops

|           | OPq rA, rB                      | (R[rB]=R[rB] OP R[rA])      |
|-----------|---------------------------------|-----------------------------|
|           | icode:ifun $\leftarrow M_1[PC]$ | Read instruction byte       |
| Fetch     | $rA:rB \leftarrow M_1[PC+1]$    | Read register byte          |
|           | $valP \leftarrow PC+2$          | Compute next PC             |
| Decode    | $valA \leftarrow R[rA]$         | Read operand A              |
| Decode    | $valB \leftarrow R[rB]$         | Read operand B              |
| Execute   | $valE \leftarrow valB OP valA$  | Perform ALU operation       |
|           | Set CC                          | Set condition code register |
| Memory    |                                 |                             |
| Write     | $R[rB] \leftarrow valE$         | Write back result           |
| back      |                                 |                             |
| PC update | $PC \leftarrow valP$            | Update PC                   |

- Formulate instruction execution as sequence of simple steps
- Use same general form for all instructions
- Note: M<sub>1</sub> indicates we're accessing 1 byte of memory (usually instruction memory), while M<sub>8</sub> indicates we're accessing 8 bytes (usually data memory)

## Executing rmmovq

rmmovq rA, D(rB) 4 0 rA rB

- Fetch
  - Read 10 bytes
- Decode
  - Read operand registers
- Execute
  - Compute effective address

- Memory
  - Write to memory (rB+D)

D

- Write back
  - Do nothing
- PC Update
  - Increment PC by 10

#### Stage Computation: rmmovq

|           | rmmovq rA, D(rB)                | (Move R[rA] to M <sub>8</sub> [R[rB]+D]) |
|-----------|---------------------------------|------------------------------------------|
|           | icode:ifun $\leftarrow M_1[PC]$ | Read instruction byte                    |
| Catab     | $rA:rB \leftarrow M_1[PC+1]$    | Read register byte                       |
| Fetch     | $valC \leftarrow M_8[PC+2]$     | Read displacement D                      |
|           | valP $\leftarrow$ PC+10         | Compute next PC                          |
| Decode    | $valA \leftarrow R[rA]$         | Read operand A                           |
| Decode    | valB ← R[rB]                    | Read operand B                           |
| Execute   | valE ← valB + valC              | Compute effective address                |
| Memory    | $M_8[valE] \leftarrow valA$     | Write value to memory                    |
| Write     |                                 |                                          |
| back      |                                 |                                          |
| PC update | $PC \leftarrow valP$            | Update PC                                |

Use ALU for address computation

# Executing popq

#### Fetch

- Read 2 bytes
- Decode
  - Read stack pointer
- Execute
  - Increment stack pointer by 8

- Memory
  - Read from old stack pointer

Write back

- Update stack pointer
- Write result to register

PC Update

Increment PC by 2

### Stage Computation: popq

|           | popq rA                         |  |
|-----------|---------------------------------|--|
|           | icode:ifun $\leftarrow M_1[PC]$ |  |
| Fetch     | $rA:rB \leftarrow M_1[PC+1]$    |  |
|           | $valP \leftarrow PC+2$          |  |
| Decode    | valA ← R[%rsp]                  |  |
| Decoue    | valB ← R[%rsp]                  |  |
| Execute   | $valE \leftarrow valB + 8$      |  |
| Memory    | $valM \leftarrow M_8[valA]$     |  |
| Write     | R[%rsp] ← valE                  |  |
| back      | R[rA] ← valM                    |  |
| PC update | $PC \leftarrow valP$            |  |

(Move M<sub>8</sub>[R[%rsp]] to R[rA]) Read instruction byte Read register byte

Compute next PC Read stack pointer Read stack pointer Increment stack pointer

Read from stack Update stack pointer Write back result Update PC

Use ALU to increment stack pointer

- Must update two registers
  - Popped value
  - New stack pointer



- Read operand registers
- Execute

Decode

Fetch

If !cond, then set destination register to 0xF



- Update register (or not)
- PC Update
  - Increment PC by 2

### Stage Computation: Cond Move

| PC update | $PC \leftarrow valP$          | Update PC                          |
|-----------|-------------------------------|------------------------------------|
| back      |                               |                                    |
| Write     | $R[rB] \leftarrow valE$       | Write back result                  |
| Memory    |                               |                                    |
| LACCULC   | If ! Cond(CC,ifun) rB ← 0xF   | (Disable register update if !Cond) |
| Execute   | valE ← valB + valA            | Pass valA through ALU              |
| DELUUE    | valB ← 0                      |                                    |
| Decode    | $valA \leftarrow R[rA]$       | Read operand A                     |
|           | $valP \leftarrow PC+2$        | Compute next PC                    |
| Fetch     | $IA.ID \leftarrow WI_1[PCTI]$ | Redu legisler byle                 |
|           | $rA:rB \leftarrow M_1[PC+1]$  | Read register byte                 |
|           | icode:ifun ← M₁[PC]           | Read instruction byte              |
|           | cmovXX rA, rB                 |                                    |

- Read register rA and pass through ALU
- Cancel move by setting destination register to 0xF
  - If condition codes & move condition indicate no move

# **Executing Jumps**



#### Fetch

- Read 9 bytes
- Increment PC by 9

#### Decode

Do nothing

#### Execute

 Determine whether to take branch based on jump condition and condition codes

#### Memory

- Do nothing
- Write back
  - Do nothing

#### PC Update

 Set PC to Dest if branch taken or to incremented PC if not branch

## Stage Computation: Jumps

|           | jXX Dest                          |                          |
|-----------|-----------------------------------|--------------------------|
|           | icode:ifun $\leftarrow M_1[PC]$   | Read instruction byte    |
| Fetch     | $valC \leftarrow M_8[PC+1]$       | Read destination address |
|           | valP ← PC+9                       | Fall through address     |
| Decode    |                                   |                          |
| Execute   | Cnd ← Cond(CC,ifun)               | Take branch?             |
| Memory    |                                   |                          |
| Write     |                                   |                          |
| back      |                                   |                          |
| PC update | $PC \leftarrow Cnd$ ? valC : valP | Update PC                |

- Compute both addresses
- Choose based on setting of condition codes and branch condition

# Executing call



#### Fetch

- Read 9 bytes
- Increment PC by 9

#### Decode

Read stack pointer

#### Execute

Decrement stack pointer by 8

#### Memory

- Write incremented PC to new value of stack pointer
- Write back
  - Update stack pointer
- PC Update
  - Set PC to Dest

### Stage Computation: call

|           | call Dest                       |                             |
|-----------|---------------------------------|-----------------------------|
|           | icode:ifun $\leftarrow M_1[PC]$ | Read instruction byte       |
| Fetch     |                                 |                             |
|           | $valC \leftarrow M_8[PC+1]$     | Read destination address    |
|           | $valP \leftarrow PC+9$          | Compute return point        |
|           |                                 |                             |
| Decode    | valB ← R[%rsp]                  | Read stack pointer          |
| Execute   | valE ← valB + –8                | Decrement stack pointer     |
|           |                                 |                             |
| Memory    | M <sub>8</sub> [valE] ← valP    | Write return value on stack |
| Write     | R[%rsp] ← valE                  | Update stack pointer        |
| back      |                                 |                             |
| PC update | $PC \leftarrow valC$            | Set PC to destination       |

- Use ALU to decrement stack pointer
- Store incremented PC

# Executing ret

| ret     | 90    |
|---------|-------|
|         |       |
| return: | xx xx |

#### Fetch

- Read 1 byte
- Decode
  - Read stack pointer

#### Execute

Increment stack pointer by 8

#### Memory

- Read return address from old stack pointer
- Write back
  - Update stack pointer
- PC Update
  - Set PC to return address

### Stage Computation: ret

|           | ret                              |                                                          |
|-----------|----------------------------------|----------------------------------------------------------|
| Fetch     | icode:ifun ← M₁[PC]              | Read instruction byte                                    |
| Decode    | valA ← R[%rsp]<br>valB ← R[%rsp] | Read operand stack pointer<br>Read operand stack pointer |
| Execute   | valE ← valB + 8                  | Increment stack pointer                                  |
| Memory    | $valM \leftarrow M_8[valA]$      | Read return address                                      |
| Write     | R[%rsp] ← valE                   | Update stack pointer                                     |
| back      |                                  |                                                          |
| PC update | $PC \leftarrow valM$             | Set PC to return address                                 |

- Use ALU to increment stack pointer
- Read return address from memory

#### **Computation Steps**

|           |            | OPq rA, rB                      |
|-----------|------------|---------------------------------|
|           | icode,ifun | icode:ifun $\leftarrow M_1[PC]$ |
| Fetch     | rA,rB      | $rA:rB \leftarrow M_1[PC+1]$    |
| reich     | valC       |                                 |
|           | valP       | $valP \leftarrow PC+2$          |
| Decode    | valA, srcA | $valA \leftarrow R[rA]$         |
| Decode    | valB, srcB | $valB \leftarrow R[rB]$         |
| Execute   | valE       | valE ← valB OP valA             |
| Execute   | Cond code  | Set CC                          |
| Memory    | valM       |                                 |
| Write     | dstE       | $R[rB] \leftarrow valE$         |
| back      | dstM       |                                 |
| PC update | PC         | PC ← valP                       |

Read instruction byte Read register byte [Read constant word] Compute next PC Read operand A Read operand B Perform ALU operation Set/use cond. code reg [Memory read/write] Write back ALU result [Write back memory result] Update PC

- All instructions follow same general pattern
- Differ in what gets computed on each step

#### **Computation Steps**

|           |            | call Dest                       |
|-----------|------------|---------------------------------|
|           | icode,ifun | icode:ifun $\leftarrow M_1[PC]$ |
| Fetch     | rA,rB      |                                 |
| reich     | valC       | $valC \leftarrow M_8[PC+1]$     |
|           | valP       | $valP \leftarrow PC+9$          |
| Decode    | valA, srcA |                                 |
| Decode    | valB, srcB | valB ← R[%rsp]                  |
| Execute   | valE       | valE ← valB + -8                |
| Execute   | Cond code  |                                 |
| Memory    | valM       | M <sub>8</sub> [valE] ← valP    |
| Write     | dstE       | R[%rsp] ← valE                  |
| back      | dstM       |                                 |
| PC update | PC         | $PC \leftarrow valC$            |

Read instruction byte [Read register byte] Read constant word Compute next PC [Read operand A] Read operand B Perform ALU operation [Set /use cond. code reg] Memory read/write Write back ALU result [Write back memory result] Update PC

- All instructions follow same general pattern
- Differ in what gets computed on each step

# Summary of Computed Values

#### Fetch

- ifun Instruction function
- rA Instr. Register A
- rB Instr. Register B
- valC Instruction constant
- valP Incremented PC

#### Decode

- srcA Register ID A
- srcB Register ID B
- dstE Destination Register E
- dstM Destination Register M
- valA Register value A
- valB Register value B

#### Execute

- valE ALU result
- Cnd Branch/move flag
- Memory
  - valM Value from memory