HW4
[!ABSTRACT]
俞仲炜 3220104929 241216
1
a
\(10process \times 8lane \times 1.5GHz\times (1-20\%) \times 70\%\times 0.85 = 57.12GFLOP/s\)
b
(1)\(speedup = \frac{10process \times 16lane \times 1.5GHz\times (1-20\%) \times 70\%\times 0.85}{10process \times 8lane \times 1.5GHz\times (1-20\%) \times 70\%\times 0.85} =2\)
(2)\(speedup = \frac{15process \times 8lane \times 1.5GHz\times (1-20\%) \times 70\%\times 0.85}{10process \times 8lane \times 1.5GHz\times (1-20\%) \times 70\%\times 0.85} =1.5\)
(3)\(speedup = \frac{10process \times 8lane \times 1.5GHz\times (1-20\%) \times 70\%\times 0.95}{10process \times 8lane \times 1.5GHz\times (1-20\%) \times 70\%\times 0.85} \approx 1.11\)
2
512 bit = 8 x 8 byte
524 / 8 = 65.5
66 x 4 = 264 clock cycles
3
A,保证同一个进程内是顺序执行 load 和 store 的,就形成了互斥锁
B,Dynamic Scheduling 是 ILP 重新排序指令的技术
A,超标量处理器,并行执行多条指令
4
(a) C0.L0: (S, AC20, 0020)
(b) C0.L0: (M, AC20, 0080) C3.L0: (I, AC20, 0020)
(c) C3.L0: (M, AC20, 0080)
(d) C1.L2: (S, AC10, 0010)
(e) C0.L1: (M, AC08, 0048) C3.L1: (I, AC08, 0008)
(f) C0.L2: (M, AC30, 0078) M: AC10, 0030
题目说有(g),但 pdf 里似乎没截到,查了下课本是 (g) C3: W, AC30 <-- 78
相应的变化为 C3.L2: (M, AC30, 0078)
5
a
一轮循环的伪代码大致如下:
vld a_re[i]
vld b_re[i]
vld a_im[i]
vld b_im[i]
vmul t0, a_re[i], b_re[i]
vmul t1, a_im[i], b_im[i]
vsub a0, t0, t1
vmul t0, a_re[i], b_im[i]
vmul t1, a_im[i], b_re[i]
vadd a1, t0, t1
vsd a0
vsd a1
根据数据依赖关系我们可以拆分为 6 个 chimes 分别进行向量运算:
#1
vld a_im
vmul t0, a_re, b_re
#2
vld b_im
vmul t1, a_im, b_im
#3
vsub a0, t0, t1
vsd a0
#4
vld a_re
vmul t0, a_re, b_im
#5
vld b_re
vmul t1, a_im, b_re
#6
vadd a1, t0, t1
vsd a1
计算一次 complex result value 需要:
\((6*64 + 15 *6 + 8*4+5*2)/128=4.03\) cycles
b
#1
vld a_re
vld b_re
vmul t0, a_re, b_re
#2
vld a_im
vld b_im
vmul t1, a_im, b_im
#3
vsub a0, t0, t1
vsd a0
#4
vmul t0, a_re, b_im
#5
vmul t1, a_im, b_re
#6
vadd a1, t0, t1
vsd a1
计算一次 result 需要:
\(\lfloor(6*64 + 15 *6 + 8*4+5*2)/128\rfloor=4.03\) cycles