Lecture 11 | Query Processing

:material-circle-edit-outline: 约 1512 个字 :fontawesome-solid-code: 3 行代码 :material-clock-time-two-outline: 预计阅读时间 5 分钟

数据库系统2024-05-06第6-8节 (zju.edu.cn)

查询处理

就是看数据库怎么处理SQL指令这么一个字符串的

Basic Steps in Query Processing

Parsing and translation
- 语法检查
- 转化为关系代数表达式
Optimization
- 查询优化
- 最重要的一步、
  - 关系代数等价变化化简
  - 对代数里的每个算子选择合适的算法
    - 每一个算子都有不同的实现方法
- 得到execution plan
  - 这个是数据库内部的计划语言
Evaluation
- execution plan交给引擎，搜索语言

给个例子语法检查和查询优化的

select name, title 
from instructor natural join (teaches natural join course)
where dept_name=‘Music’ and year = 2009;

从叶子往上执行，右边这个不是最终优化结果，而是优化过程中根据某个规则得到的，这些规则一般都是经验规则

这里的规则是把选择运算推到了叶子节点，提前选择以减少后续运算量，用于第一次优化

优化后得到执行计划文件：

An evaluation plan defines exactly what algorithm is used for each operation, and how the execution of the operations is coordinated.

接下来我们看下，每个算子有哪些常见的算法，以及如何衡量、权衡

我们就将SELECT，JOIN，SORT三类

Measures of Query Cost

通常是磁盘访问时间作为cost的大头

Number of seeks * average-seek-cost
Number of blocks read * average-block-read-cost
Number of blocks written * average-block-write-cost

一些细节

Cost to write a block is greater than cost to read a block

data is read back after being written to ensure that the write was successful

方便起见，我们只用磁盘块的读取数量以及寻找次数来衡量

$t_T$ - time to transfer one block
$t_S$ - time for one seek

这俩的比值相对稳定，用的比较多

还有一些细节

We ignore CPU costs for simplicity
We ignore cost to writing output to disk in our cost formulae
- 即我们约定忽略数据最终被写回磁盘的cost，因为流水线模式下数据不会被写回磁盘
We often use worst case estimates, assuming only the minimum amount of memory needed for the operation is available

Algorithm

接下来看下都有啥子算法

Selection Operation

Algorithm A1 (linear search)

File scan

Scan each file block and test all records to see whether they satisfy the selection condition.

worst cost = $b_r * t_T + t_S$
- $b_r$ denotes number of blocks containing records from relation r
average cost = $(b_r /2) t_T + t_S$

If selection is on a key attribute, can stop on finding record

Note: binary search generally does not make sense since data is not stored consecutively except when there is an index available, and binary search requires more seeks than index search

A2 (B+-tree index equality on key)

Index scan – search algorithms that use an index

A2 (primary B+-tree index / clustering B+-tree index, equality on key).

Cost = $(h_i + 1) * (t_T + t_S)$

我们认为B+树在磁盘里，因为每个node占一个block，所以每个node都需要一次seek + 一次transfer

A3 (B+-tree index equality on non-key)

equality 是等值查找的意思

A3 (primary B+-tree index/ clustering B+-tree index, equality on non-key)

Cost = $h_i * (t_T + t_S) + t_S + t_T * b$

下面的是书上没有的，算是补充？

这个意思是n个record可能分散在m个block

乱序

A5 (B+-index index, comparison)

之前是等值查询，这里是比较查询

分两种情况
- 找出大于V的，那就index出第一个大于V的，剩下的顺序找到结尾可，不用index
  - Cost = $h_i * (t_T + t_S) + t_S + t_T * b$
- 找出小于V的，那都不用index了，直接从头找，找到大于V的停止即可

Implementation of Complex Selections

External Sort-Merge

我们考虑的是内存不够大的情况，即需要用到外部排序

Let $M$ denote memory size (in pages).

mem有M个page

Create sorted runs(归并段)
1. Let $i$ be $0$ initially.
2. Repeatedly do the following till the end of the relation: 1. Read $M$ blocks of relation into memory 2. Sort the in-memory blocks 3. Write sorted data to run $R_i$ ; increment $i$.
3. Let $N$ be the final value of $i$ - 这个N是指有几个run
Merge the runs (N-way merge).
1. IF $N < M$, single merge pass is required (如果归并段少于可用内存页 ) 1. Use $N$ blocks of memory to buffer input runs, and 1 block to buffer output. Read the first block of each run into its buffer page 2. repeat 1. Select the first record (in sort order) among all buffer pages 2. Write the record to the output buffer. 1. If the output buffer is full, write it to disk 3. Delete the record from its input buffer page. 1. If the buffer page becomes empty then read the next block (if any) of the run into the buffer 3. until all input buffer pages are empty
2. If $N \ge M$, several merge passes are required. 1. In each pass, contiguous groups of $M - 1$ runs are merged. 2. A pass reduces the number of runs by a factor of $M - 1$, and creates runs longer by the same factor. 1. E.g. If M=11, and there are 90 runs, one pass reduces the number of runs to 9, each 10 times the size of the initial runs 3. Repeated passes are performed till all runs have been merged into one

cost分析

br是数据占的block，M是内存buffer有的block

M-1是一次能归并的block，还有一个留来输出

Join Operation

有五种典型算法

Nested-loop join

Block nested-loop

join Indexed

nested-loop join

Merge-join Hash-join

Nested-Loop Join

两重循环，遍历r和s的tuple

外循环的叫outer relation，内循环的叫inner relation

Block nested-loop

循环基本单位改成块，然后块与块再一个嵌套循环

Worst case estimate: $ b_r+b_r * b_s$ block transfers + $2 * b_r$ seeks

Each block in the inner relation $s$ is read once for each block in the outer relation

可见小关系放外循环比较好

Best case: $b_r + b_s$ block transfers + $2$ seeks

即内存足够大，r和s一次性进去即可

注意，上面是只有2个buffer block的情况，实际上我们有很多buffer block，下面讲多个buffer的情况

Indexed Nested-Loop Join

我们之前的思路都是外循环与内循环一一匹配

我们尝试通过索引替代内循环

Index lookups can replace file scans if
- join is an equi-join or natural join
- and an index is available on the inner relation’s join attribute

对于每个r的tuple，我们通过索引查找满足条件的s的tuple

Worst case: buffer has space for only one page of r, and, for each tuple in r, we perform an index lookup on s.

Cost of the join: $b_r (t_T + t_S) + n_r * c$

Where c is the cost of traversing index and fetching all matching s tuples for one tuple or r
- c can be estimated as cost of a single selection on s using the join condition.

If indices are available on join attributes of both r and s, use the relation with fewer tuples as the outer relation.