Skip to content

Lecture 11 | Memory Hierarchy - Cache basics

:material-circle-edit-outline: 约 1381 个字 :material-clock-time-two-outline: 预计阅读时间 5 分钟

计算机组成2024-05-07第6-8节 (zju.edu.cn)

Orga_Ch5_V1.1(2).pdf

5.3 The basics of Cache

第四章的inst mem和data mem实际上就是兩個独立的cache

每一个下层的数据只会唯一地出现在cache里面,即相同的数据不会重复占用cache地址

如下图所示,数据在cache是乱放的,所以我们要看下怎么寻址比较好

image-20240519195551756

Direct Mapped Cache

直接映射

Direct-mapping algorithm:(Block address) modulo (Number of blocks in the cache)

就是取 block addr 的低 \(n\) 位作为 cache 内的地址,一样的在一起

\(n\) 取决于cache有 \(2^n\) 个index

至于怎么区分cache一个block里面的blocks,我们可以通过剩下高处的位作为判断,这些剩下的高处的地址位叫做 \(tag\)

image-20240519195803438

但是有些mem的block可能没有data,我们需要标记一下,于是有了所谓的 valid bits,其值为1表示present,0表示 no present,初始化为0

下图是Direct Mapped Cache的完整示意图

V是valid bits,index是低 \(n\) 位,byte offset取决于一个block有多少byte

block addr = tag+index+byte offset

image-20240519200820096

下面举个例子

image-20240519201436079

cache没有就miss了,需要从mem中读进来

如果重复访问就能hit了

如果index一样但tag不一样,依旧会miss,然后覆盖相应index的tag和data

下面举个例子,看下tag bits和cache size怎么计算

image-20240519203910292

n就是index占的位数

\(2^m\) 是一个block有多少个word,2是byte的地址寻址位数

m+2占的位用于在cache的data里面找到需要的byte,m用于找到block里面的word,2用于找到word里面的byte(一个word有4个byte,只需要两bit寻址)

这个2就是刚刚提到的byte offset

注意,这里的index,m,n,tag,都是CPU提供的物理地址如何被cache解码用到的概念

image-20240519205313980

下面给个计算cache大小的例子

注意KB的B是byte,不是bit

image-20240519205702620

image-20240519211538703

上面这题答案是11,就是让你算index

image-20240519220745947

Handling Cache hit and Misses

Read misses—two kinds of misses

  • instruction cache miss
    • 遇到时processor会将PC-4这个地址交给mem,将相应的inst重新读入inst cache,再重新操作
  • data cache miss

Write hits

Different Strategies

  • write-back: Cause Inconsistent
    • 只更新cache,不更新mem
    • 很快,但会导致不一致,但是可以多次写出在一起写入mem,减少mem操作
  • write-through: Ensuring Consistency
    • 写入cache和mem,但是比较慢

Write misses

就是发现cache里面没读入这个要写的block

read the entire block into the cache, then write the word

Four Questions for Memory Designers

这部分会详细讲cache怎么实现

  1. 下层数据该映射到上层的什么位置(Block placement)
  2. 怎么确认下层数据已经在上层(Block identification)
  3. miss的时候怎么办(Block replacement)
  4. 写该如何实现(Write strategy)

Q1: Block Placement

  • Direct mapped
    • Block can only go in one place in the cache
  • Fully associative
    • Block can go anywhere in cache.
  • Set associative
    • Block can go in one of a set of places in the cache.
    • A set is a group of blocks in the cache.
    • fully associative is m-way set-associative (for a cache with m blocks)
      • Note that direct mapped is the same as 1-way set associative

image-20240519214529034

Q2: Block Identification

  • Tag
  • Valid bit

The Format of the Physical Address

  • index
    • The set, in case of a set-associative cache
    • The block, in case of a direct-mapped cache
    • Has as many bits as log2(#sets) for set-associative caches, or log2(#blocks) for direct-mapped caches
  • The Byte Offset field selects
  • The Tag

image-20240519214719573

image-20240519214745528

image-20240519214754833

采用了并行比较,就是同时检查所有cache entity目前有无数据

而且物理地址没有index了

image-20240519214908460

index指示set了

但是这图有问题,寻址应该寻的是set,而不是block

而且,set确定后只需要对set里面的entity进行valid bit比较即可,不需要全部比

Q3: Block Replacement

这问题直接映射能直接解决,另外两个需要另外看看,就set或cache满了该怎么替换

  • Random replacement
  • Least-recently used (LRU)
    • 就是将最久没用的换出去,刚进来的不换
  • MRU
    • LRU反过来
  • First in,first out(FIFO)

Q4: Write Strategy

  • write-back cache
    • 有个问题是cache里数据写了还没放回mem,需要进行保护
    • 我们加一个dirty bit来标记这种entity
    • 即,write-back cache 比 write-through cache 多一bit
    • 效率更高,但是设计更复杂
  • write-through cache

Write Stall

  • Write stall --When the CPU must wait for writes to complete during write through
    • Write buffers
    • 即CPU和mem之间加一个small cache
    • CPU写出会往 cache 和 Write buffers 一起写,让这个buffer消化写出操作
    • 经常用于 write-through cache

Write Misses

  • Write allocate
    • 需要先从mem读入cache,再写
  • Write around / No write allocate
    • 不写cache,用 Write buffers 直接写入mem

Memory system

Make reading multiple words easier by using banks of memory

image-20240519220654454

image-20240519220825710

image-20240519220958269

image-20240519221007631

5.4 Measuring and improving cache performance

How to measure cache performance?

How to improve performance?

Measuring cache performance

Reducing cache misses by more flexible placement of blocks

Reducing the miss penalty using multilevel caches

  • Average Memory Assess Time (AMAT) = hit time + miss time = hit time + miss rate × miss penalty
  • Hit time: The time to access the upper level of the memory hierarchy, which includes the time needed to determine whether the access is a hit or a miss.
    • find if hit or miss + cache to CPU
  • miss penalty: The time to replace a block in the upper level with the corresponding block from the lower level, plus the time to deliver this block to the processor.
    • mem to cache + cache to CPU

Measuring cache performance

之前的 CPU_time = Inst ×CPI × Clock cycle time

我们需要重新定义

"# of"通常表示“数量”或“总数”。

  • CPU time = CPU execution clock cycles + Memory-stall clock cycles
  • Memory-stall clock cycles = # of (mem) instructions × miss ratio × miss penalty = Read-stall cycles + Write-stall cycles

image-20240520095021065

If the write buffer stalls are small, we can safely ignore them .

If the cache block size is one word, the write miss penalty is 0.

Combine the reads and writes

In most write-through cache organizations, the read and write miss penalties are the same in the time to fetch the block from memory

image-20240520095617795

Calculating Cache Performance EXP

image-20240520095628972