Skip to content

Garbage Collection

:material-circle-edit-outline: 约 951 个字 :material-clock-time-two-outline: 预计阅读时间 3 分钟

Chapter13.pdf 编译原理(本)_刘忠鑫

Garbage Collection: What

Garbage:已经分配的但不会再被用到的内存空间

Garbage: Allocated but no longer used storage

Ideally, any record that is not dynamically live (will not be used in the future of the computation) is garbage.

垃圾收集即不显示地回收垃圾,发生在程序运行期间

Garbage collection: The process of reclamation of allocated but no longer used storage without an explicit call to free.

Garbage collection is performed not by the compiler but by the runtime system (the support programs linked with the compiled code)

垃圾收集需要编译器提供静态信息

我们是没法静态确认垃圾的,所以和生存分析一样,在编译时只能保守地菲尼

Idea: Use reachability information as “approximation”

Heap-allocated records that are not reachable by any chain of pointers from program variables are garbage

Conservative: not reachable-> garbage

但反过来,垃圾可能是可达的(可被访问的),可达但不会再被用

Basic Data Structure: Directed Graph

image-20250514125639133

image-20250514125648068

Mark-and-Sweep Collection

使用 DFS

我们就通过 DFS 来检索出所有可达结点并将其标记

image-20250519162608590

所有未标记的结点都是不可达结点,是垃圾,我们就通过 Sweep 进行回收,就线性遍历 record,Link the unmarked nodes in a linked list (freelist),并且消除所有标记,以便下一轮回收

image-20250519162940627

image-20250519163016990

c 指常数,均摊是均摊至回收了多少结点

这个算法的问题是使用了递归算法 DFS,栈可能爆

Solution: Use an explicit stack instead of recursion,改成非递归

image-20250519163434846

Benefit: H words instead of H activation records

但还是要栈,我们看看有没有更好的办法

Pointer Reversal

The basic idea: store the DFS stack in the directed graph itself

When a new record is encountered during the search:

  1. Markthe record
  2. Change a pointer in the record to point back to the DFS parent record (pointer reversal)
  3. When we can go no deeper, return, following the back links, restoring the links

就是遍历到一个结点时,将过来的指针反转使其指向父节点,返回时再改回来,就不需要栈来维护顺序信息了

Chapter13.pdf 23 页有例子

image-20250519164500330

Chapter13.pdf 35 页有例子

image-20250519174131365

Reference Counts

Idea: rather than wait for memory to be exhausted, try to collect a record when there are no more pointers pointing to it (not reachable)

Keep track of how many pointers point to each record (the reference count of this record)

  • Whenever a new reference to the record is established, increment its reference count
  • When the reference count goes to 0, the record is unreachable garbage, and thus can be collected.

就是每个 record 都实时记录当前有几个 pointer 指向它,为 0 时马上回收

Chapter13.pdf 49 页有例子

The compiler emits extra instructions to manipulates the reference counts with each assignment operation

image-20250519165923193

image-20250519165935797

image-20250519171406221

可以发现,这个算法无法回收形成环的垃圾,且 reference counts 的维护增加了不少开销

image-20250519174119571

Copying Collection

Basic idea: split the memory into two parts and collect by copying

  • from-space: the one used by the program
  • to-space: the one unused until garbage collection time

When from-space is exhausted(用完), traverse the graph formed with program variables and from-space, and copy all reachable records to to-space

When copying is done, the roots are made to point at the to space copy; the entire from-space is unreachable; we change the role of from-space and to-space

即 from space 用完时,将可达的 record 复制到 to space,只留下不可达的 record

然后反转两个空间的身份

这使得复制后得到的 to space 是紧凑的,没有 fragmentation

image-20250519172150272

image-20250519174054526

Pointer Forwarding

We need to traverse all the reachable records, as for the mark-and-sweep. As we find a reachable record, we copy it into to-space, we have to preserve the points-to-relations

需要更新所有的指针,让新的指向新的,而不是仍然旧的

image-20250519172501414

image-20250519172546588

image-20250519172612132

Cheney’s Algorithm

Cheney’s algorithm: a collection algorithm using breadth first search (BFS) to traverse the reachable data

image-20250519172945201

image-20250519173049531

image-20250519173152051

image-20250519173901338

A Hybrid Algorithm

image-20250519174033337

Interface to the Compiler

image-20250519174220043

Fast Allocation

堆操作是通过allocate管理的,编译器可以尝试加快这个函数操作

Chapter13.pdf 79~85 页,了解即可,如何优化 allocate,懒得截图了,看 PPT

Describing Data Layouts

看 PPT,就是每个 record 额外一个 word 指向包含类型信息的结构

Pointer Map

需要确定哪些变量包含指针,变量在 record 还是在 reg