Garbage Collection
Garbage Collection: What
Garbage:已经分配的但不会再被用到的内存空间
Garbage: Allocated but no longer used storage
Ideally, any record that is not dynamically live (will not be used in the future of the computation) is garbage.
垃圾收集即不显示地回收垃圾,发生在程序运行期间
Garbage collection: The process of reclamation of allocated but no longer used storage without an explicit call to free.
Garbage collection is performed not by the compiler but by the runtime system (the support programs linked with the compiled code)
垃圾收集需要编译器提供静态信息
我们是没法静态确认垃圾的,所以和生存分析一样,在编译时只能保守地菲尼
Idea: Use reachability information as “approximation”
Heap-allocated records that are not reachable by any chain of pointers from program variables are garbage
Conservative: not reachable-> garbage
但反过来,垃圾可能是可达的(可被访问的),可达但不会再被用
Basic Data Structure: Directed Graph
Mark-and-Sweep Collection
使用 DFS
我们就通过 DFS 来检索出所有可达结点并将其标记
所有未标记的结点都是不可达结点,是垃圾,我们就通过 Sweep 进行回收,就线性遍历 record,Link the unmarked nodes in a linked list (freelist),并且消除所有标记,以便下一轮回收
c 指常数,均摊是均摊至回收了多少结点
这个算法的问题是使用了递归算法 DFS,栈可能爆
Solution: Use an explicit stack instead of recursion,改成非递归
Benefit: H words instead of H activation records
但还是要栈,我们看看有没有更好的办法
Pointer Reversal
The basic idea: store the DFS stack in the directed graph itself
When a new record is encountered during the search:
- Markthe record
- Change a pointer in the record to point back to the DFS parent record (pointer reversal)
- When we can go no deeper, return, following the back links, restoring the links
就是遍历到一个结点时,将过来的指针反转使其指向父节点,返回时再改回来,就不需要栈来维护顺序信息了
Chapter13.pdf 23 页有例子
Chapter13.pdf 35 页有例子
Reference Counts
Idea: rather than wait for memory to be exhausted, try to collect a record when there are no more pointers pointing to it (not reachable)
Keep track of how many pointers point to each record (the reference count of this record)
- Whenever a new reference to the record is established, increment its reference count
- When the reference count goes to 0, the record is unreachable garbage, and thus can be collected.
就是每个 record 都实时记录当前有几个 pointer 指向它,为 0 时马上回收
Chapter13.pdf 49 页有例子
The compiler emits extra instructions to manipulates the reference counts with each assignment operation
可以发现,这个算法无法回收形成环的垃圾,且 reference counts 的维护增加了不少开销
Copying Collection
Basic idea: split the memory into two parts and collect by copying
- from-space: the one used by the program
- to-space: the one unused until garbage collection time
When from-space is exhausted(用完), traverse the graph formed with program variables and from-space, and copy all reachable records to to-space
When copying is done, the roots are made to point at the to space copy; the entire from-space is unreachable; we change the role of from-space and to-space
即 from space 用完时,将可达的 record 复制到 to space,只留下不可达的 record
然后反转两个空间的身份
这使得复制后得到的 to space 是紧凑的,没有 fragmentation
Pointer Forwarding
We need to traverse all the reachable records, as for the mark-and-sweep. As we find a reachable record, we copy it into to-space, we have to preserve the points-to-relations
需要更新所有的指针,让新的指向新的,而不是仍然旧的
Cheney’s Algorithm
Cheney’s algorithm: a collection algorithm using breadth first search (BFS) to traverse the reachable data
A Hybrid Algorithm
Interface to the Compiler
Fast Allocation
堆操作是通过allocate管理的,编译器可以尝试加快这个函数操作
Chapter13.pdf 79~85 页,了解即可,如何优化 allocate,懒得截图了,看 PPT
Describing Data Layouts
看 PPT,就是每个 record 额外一个 word 指向包含类型信息的结构
Pointer Map
需要确定哪些变量包含指针,变量在 record 还是在 reg