Skip to content

Buffer Overflow

:material-circle-edit-outline: 约 1046 个字 :fontawesome-solid-code: 34 行代码 :material-clock-time-two-outline: 预计阅读时间 4 分钟

与 PL 比较相关,很多原因不会出现 Buffer Overflow

wiki 不适合作为文献被引用

背景知识

Type safety

type safety and type soundness are the extent to which a programming language discourages or prevents type errors.

Java 就是 type-safety 的,其会在运行时实时检查各类型变量是否越界,即超出定义范围,而 C 是 type-unsafe 的,可以对定义范围外的地址进行操作,所以会出现 Buffer Overflow

例如 int a[10]; a[10] = 3,Java 有Runtime Array Bounds Checking,会弹异常,C 是 No Array Bounds Checking

Memory Management

C 需要手动分配、释放内存

gets, fgets, strcpy

strncpy 中,src 过长会导致 dest 没有终结符 '\0',可能导致segmentation fault

所以一般长度设置为 sizeof(dest)-1,然后手动设置 dest 最后一个位置为终结符

Program Memory Layout

// on .data
int x = 100;

int main(){
    // data stored on stack
    int a = 2;
    float b = 2.5;

    // on .bss
    static int y;

    // allocate memory on heap, namely ptr on heap
    int *ptr = (int*) malloc(2*sizeof(int));

    // value 5 and 6 stored on heap
    ptr[0] = 5;
    ptr[1] = 6;

    //deallocate memory on heap
    free(ptr);

    return 1;
}

image-20241013112441573

段名 主要作用
.text 通常存放程序执行代码
.rodata 通常存放常量等只读数据
.data 通常存放已初始化的全局变量、静态变量
.bss 通常存放未初始化的全局变量、静态变量

Stack Layout

看看函数被调用时栈如何变化,注意栈在内存里是从高往低增长,栈顶即最低地址处(fp

void func(int a, int b){
    int x, y;
    x = a + b;
    y = a - b;
}

func 被调用时,现在栈顶(即最低处)往下分配一块内存,然后依次压入

  1. 先压入参数,注意是逆序压入,先放 b 再放 a
  2. 返回地址即 caller 对 callee 的调用代码的下一行代码的地址,用于 callee 结束时返回 caller
  3. 然后压入 caller 的栈帧指针 ebp
    • 这个可以作为基地址,其加上偏移量可访问 retaddr,减去偏移量可访问 local var
      • image-20241013120134921
    • 不用实时 ESP 是因为运行时其在动态变化
  4. 最后算 callee 的本地变量

image-20241013115004381

Stack Frame

  • The stack pointer
    • points to the top of the stack
    • RSP in Intel x86-64, ESP in intel x86-32
  • The frame pointer
    • points to the end of the current frame
    • also called the base pointer
    • RBP in Intel x86-64, EBP in Intel x86-32

caller & callee

call 指令用于调用,会先往栈压入返回值

image-20241013120644086

下面是 main 函数,红框即调用函数的部分,先逆序压入参数,再使用 call,这里的返回地址即 1205

image-20241013120656853

经过 Function Call 后,栈形如:

image-20241013122150193

下面是 function 函数

不同格式的汇编对 mov 的定义不同

image-20241013121345459

Prologue 部分,先把 ebp 压入栈,再把 esp 复制给 ebp

经过 Prologue 后,ebp 位置在 old ebp 底部,即可作为基地址使用

image-20241013123010317

注意,old ebp 与 local var 之间是有空白空间的,别忘了 local var 是从低往高增长

Epilogue 部分,先 leaveret,两个伪指令含义如下

leave ==
  mov %ebp, %esp 
  pop %ebp

ret == 
  pop eip

即,先复位 esp 到 old ebp 底部,再通过 pop 弹出 old ebp,还给 ebp

eip 即 PC,相当于把 retaddr 交给 PC,回到 Caller

How to Exploit Buffer Overflow

我们针对的是栈上的 overflow,且是 32 位

A buffer overflow occurs when data is written outside of the boundaries of the memory allocated to a particular data structure

  • modify
    • return address on the stack
    • function pointer
    • local variable
    • heap data structures

Smashing the Stack

Occurs when a buffer overflow overwrites data in the program stack

Successful exploits can overwrite the return address on the stack

Allowing execution of arbitrary code on the targeted machine

因为本地变量是从低往高增长,所以我们有机会往高覆盖掉 ret addr

image-20241013125030812

  • The user input buffer will overwrite the return address on the stack
    • Case I: the overwritten return address is invalid -> crash (why?)
    • Case II: the overwritten return address is valid but in kernel space
    • Case III: the overwritten return address is valid, but points to data
    • Case IV: the overwritten return address happens to be a valid one

我们得精确地覆盖到 retaddr,且精确控制新的 retaddr 往哪跳

How To Exploit

We can overwrite the return address on the stack to shellcode

image-20241013125552170

Shellcode

shellcode =

b'\x6a\x42\x58\xfe\xc4\x48\x99\x52\x48\xbf\x2f\x62\x69\x6e\x2f\x2f\x73\x68\x57\x54\x5e\x49\x89\xd0\x49\x89 \xd2\x0f\x05'

image-20241013132730338

Defense

  • 三种思路
    • 避免程序本身有漏洞
    • 承认程序有漏洞,提高攻击代价
    • 限制权限,降低攻击效果

ASLR

Address space layout randomization (ASLR) 地址空间随机化

ASLR randomly arranges the address space positions of key data areas of a process, including the base of the executable and the positions of the stack, heap and libraries.

Stack Guard

Put a canary between the return address

  • Canary should be random
  • The canary value should not be on the stack
    • to prevent it from being checked

When returned to the caller, the canary will be checked

image-20241013133700243

QUIZ

image-20241014081849160