Skip to content

HW4

:material-circle-edit-outline: 约 319 个字 :fontawesome-solid-code: 51 行代码 :material-clock-time-two-outline: 预计阅读时间 2 分钟

[!ABSTRACT]

俞仲炜 3220104929 241216

1

a

\(10process \times 8lane \times 1.5GHz\times (1-20\%) \times 70\%\times 0.85 = 57.12GFLOP/s\)

b

(1)\(speedup = \frac{10process \times 16lane \times 1.5GHz\times (1-20\%) \times 70\%\times 0.85}{10process \times 8lane \times 1.5GHz\times (1-20\%) \times 70\%\times 0.85} =2\)

(2)\(speedup = \frac{15process \times 8lane \times 1.5GHz\times (1-20\%) \times 70\%\times 0.85}{10process \times 8lane \times 1.5GHz\times (1-20\%) \times 70\%\times 0.85} =1.5\)

(3)\(speedup = \frac{10process \times 8lane \times 1.5GHz\times (1-20\%) \times 70\%\times 0.95}{10process \times 8lane \times 1.5GHz\times (1-20\%) \times 70\%\times 0.85} \approx 1.11\)

2

512 bit = 8 x 8 byte

524 / 8 = 65.5

66 x 4 = 264 clock cycles

3

A,保证同一个进程内是顺序执行 load 和 store 的,就形成了互斥锁

B,Dynamic Scheduling 是 ILP 重新排序指令的技术

A,超标量处理器,并行执行多条指令

4

(a) C0.L0: (S, AC20, 0020)

(b) C0.L0: (M, AC20, 0080) C3.L0: (I, AC20, 0020)

(c) C3.L0: (M, AC20, 0080)

(d) C1.L2: (S, AC10, 0010)

(e) C0.L1: (M, AC08, 0048) C3.L1: (I, AC08, 0008)

(f) C0.L2: (M, AC30, 0078) M: AC10, 0030

题目说有(g),但 pdf 里似乎没截到,查了下课本是 (g) C3: W, AC30 <-- 78

相应的变化为 C3.L2: (M, AC30, 0078)

5

a

一轮循环的伪代码大致如下:

vld a_re[i]
vld b_re[i]
vld a_im[i]
vld b_im[i]

vmul t0, a_re[i], b_re[i]
vmul t1, a_im[i], b_im[i]
vsub a0, t0, t1

vmul t0, a_re[i], b_im[i]
vmul t1, a_im[i], b_re[i]
vadd a1, t0, t1

vsd a0
vsd a1

根据数据依赖关系我们可以拆分为 6 个 chimes 分别进行向量运算:

#1
vld a_im
vmul t0, a_re, b_re
#2
vld b_im
vmul t1, a_im, b_im
#3
vsub a0, t0, t1
vsd a0
#4
vld a_re
vmul t0, a_re, b_im
#5
vld b_re
vmul t1, a_im, b_re
#6
vadd a1, t0, t1
vsd a1

计算一次 complex result value 需要:

\((6*64 + 15 *6 + 8*4+5*2)/128=4.03\) cycles

b

#1
vld a_re
vld b_re
vmul t0, a_re, b_re
#2
vld a_im
vld b_im
vmul t1, a_im, b_im
#3
vsub a0, t0, t1
vsd a0
#4
vmul t0, a_re, b_im
#5
vmul t1, a_im, b_re
#6
vadd a1, t0, t1
vsd a1

计算一次 result 需要:

\(\lfloor(6*64 + 15 *6 + 8*4+5*2)/128\rfloor=4.03\) cycles