Skip to content

Lutienpeist重开的计算机生活

10

Lecture 10 | Indexing

:material-circle-edit-outline: 约 810 个字 :material-clock-time-two-outline: 预计阅读时间 3 分钟

Chapter 14: Indexing

Basic Concepts

Indexing mechanisms used to speed up access to desired data.

例如图书馆找书，根据书名首字母，按字母表排序找

Search Key - attribute to set of attributes used to look up records in a file

An index file consists of records (called index entries) of the form:

Index files are typically much smaller than the original file

Two basic kinds of indices:
- Ordered indices: search keys are stored in sorted order
  - 就像按首字母顺序找书
- Hash indices: search keys are distributed uniformly across “buckets” using a “hash function”.
  - 哈希函数转化，根据search key计算出地址
  - 但是很难找到均匀的哈希函数
  - 所以还是用Ordered indice更多

Ordered Indices

Primary index& Secondary index

In an ordered index, index entries are stored sorted on the search key value.

Primary index（主索引）: in a sequentially ordered file, the index whose search key specifies the sequential order of the file.
- Also called clustering index（聚集索引）
- The search key of a primary index is usually but not necessarily the primary key

即，源文件中的record已经按照某attribute的顺序排下来，这个attribute上的index就是primary index

search key次序与物理文件排放次序一样

最高效的一种

对一个物理文件，基本上最多只有一个主索引

Secondary index（辅助索引）: an index whose search key specifies an order different from the sequential order of the file.
- Also called non-clustering index.

search key次序与物理文件排放次序不一样

Index-sequential file（索引顺序文件）: ordered sequential file with a primary index.

注意不是直接40000->record里的40000，而是中间多一个容器

这些容器包含所有属性值相同的record的地址，比如40000指向的容器包含所有属性值为40000的record地址

因为一个40000可能对应多个record

Dense Index Files & Sparse Index Files

Dense index(稠密索引) — Index record appears for every search-key value in the file.

每个record都有个索引项

Sparse Index（稀疏索引）: contains index records for only some search-key values.

查找方法类似B+树找leaf

稀疏索引只能建立在主索引上

辅助索引很难更新

Multilevel Index（多级索引）

If primary index does not fit in memory, access becomes expensive.

Solution: treat primary index kept on disk as a sequential file and construct a sparse index on it.

outer index – a sparse index of primary index
inner index – the primary index file

If even outer index is too large to fit in main memory, yet another level of index can be created, and so on.

Indices at all levels must be updated on insertion or deletion from the file.

不就是B+树嘛

B+-Tree Index Files

B+-tree index files are an alternative to indexed-sequential files.

B+树每一个node占用disk的一个block，4096byte=4KiB】、

B+树的各类操作见ADS笔记，会考

B+- tree : height and size estimation

重要考点，计算空间大小

注意小数是往上取还是往下取

每个record的大小

每个block可以放多少个record

n个record需要多少个block

fan-out是什么玩意不知道

哦哦，是一个node多少个pointer，即多少阶

注意叶子删掉，不用管内部节点，不然要再删除一次，与ADS讲的不一样

B+-Tree File Organization

之前讲的有堆，顺序，哈希

Other Issues in Indexing

针对second key的难更新问题，我们可以这样

一个B+树通过second key的属性值找对应的primary key

另一个B+树通过primary key 找到物理地址

Indexing Strings

第三节课没听240429

ch14 - Indexing(4).pdf

Bulk Loading and Bottom-Up Build

Bottom-Up B+-tree Build