Skip to content

4.5

:material-circle-edit-outline: 约 471 个字 :fontawesome-solid-code: 14 行代码 :material-clock-time-two-outline: 预计阅读时间 2 分钟

Instruction-Level Parallelism (ILP)

指令集并行

  • To increase ILP
    • Deeper pipeline, as 5 stages in RISCV
  • Multiple issue
    • 多条流水线组成一个处理器
    • Start multiple instructions per clock cycle

多发 - Multiple issue ,同一时间安排多条指令

单周期CPI>1,流水新CPI=1,多发处理器CPI<1

IPC是CPI的倒数,用于描述多发处理器的性能

Static Multiple Issue

这一部分重新看PPT Orga_Ch4_Part2_V1.0(2).pdf 80

如何同时安排多条inst开始跑,我们有两种方法

  • Static multiple issue
    • Compiler groups instructions to be issued together
    • Packages them into “issue slots
    • Compiler detects and avoids hazards
  • Dynamic multiple issue
    • CPU examines instruction stream and chooses instructions to issue each cycle
    • CPU resolves hazards

为了实现多路安排,我们需要前瞻——或者说预测—— insts 之间的依赖关系,且需要预测到很远,比prediction远很多。我们称之为Speculation

Speculation and Exceptions

RISCV用的静态双发,而且两条路被加了限制:

  • One ALU/branch instruction
  • One load/store instruction

即两条只能限制运行自己约定内的指令类型

RISC-V with Static Dual Issue:

image-20240516141320751

和流水线相比,蓝线是新增的,很多东西都double了

有些小细节,一个是新增的ALU没有ALUSrc的mux

一个是Mem没有mux了

Hazards in the Dual-Issue

  • EX data hazard
    • Now can’t use ALU result in load/store in same packet
    • Split into two packets, effectively a stall
add x10, x0, x1
ld x2, 0(x10)
  • Load-use hazard
    • Still one cycle use latency, but now two instructions

Compiler Schedule Example:

Loop: 
    ld x31, 0(x20)      // x31 = array element
    add x31, x31, x21   // add scalar in x21
    sd x31, 0(x20)      // store result
    addi x20, x20, -8   // decrement pointer
    blt x22, x20, Loop  // branch if x22 < x20

image-20240516142133457

IPC = 5/4 = 1.25 (c.f. peak IPC = 2)

Loop Unrolling

一个场景:循环体展开

对于一个循环,交给两路去跑,最后两路合并

Use different registers per replication

还是上面那个Loop

image-20240516142723964

IPC = 14/8 = 1.75

Dynamic Multiple Issue

静态是编译器先处理好程序后再交给硬件跑

动态是直接交给硬件实时判断,所以硬件有大改

我们引入一个概念叫乱序执行

Allow the CPU to execute instructions out of order to avoid stalls

But commit result to registers in order

例如:

ld x31,20(x21)
add x1,x31,x2
sub x23,x23,x3
andi x5,x23,20

#Can start sub while add is waiting for ld

新的硬件设计:

image-20240516143401459