计算机系统概论 (ICS)

一些重要概念
  1. 计算机体系结构 (Computer Architecture)(狭义):系统程序员所看到的计算机的属性,即计算机的逻辑结构和功能特征,包括其各个硬部件和软部件之间的相互关系。(example: IBM 360)

    冯·诺依曼架构 (Von Neumann):① 采用二进制逻辑,② 程序存储执行, ③ 五部分组成(Input / Output Device, CPU (Central Processing Unit) = Control Unit + ALU (Arithmetic / Logic Unit) + registers, Memory Unit

    • 以指令为中心,存算分离:PC / Program Counter in Control Unit 负责记录下一条执行指令的地址,操作对象为Main Memory 主存/内存(程序代码与数据存储于此)
    • 程序状态包括:① Control Unit状态(如register file、PC), ② Main Memory中的程序数据

    image-20220919160543418

    image-20220919160613582

  2. 计算机组成 (Computer Organization):计算机系统的逻辑实现,与“计算机体系结构”并称为广义概念上的计算机体系结构

  3. 计算机实现 (Computer Implementation):计算机系统的物理实现

 

1 数据

1.1 整数

预备知识

bit (比特):1个二进制位 Byte (字节):8个二进制位,1Byte=8bit Word (字):2个字节,1Word=2Byte=16bit(X86架构;RISC-V架构下1Word=4Byte=32bit

存储单位换算(kilo, Mega, Giga, Tera, Peta, Exa) 1K=210bit 1M=220bit 1G=230bit 1T=240bit 1P=250bit 1E=260bit

image-20220919163055169

1. 整数表示

C语言中基本数据类型的大小 (in Bytes)

image-20220919163440657

重点记忆:

2. 二进制编码 (w表示字长)
原码和补码

image-20220919164436532

无符号数和带符号数

image-20220919164620068

image-20220919165826050

无符号整数加法 & 补码加法
无符号整数 & 带符号整数 除以2的k次幂

 

1.2 浮点数

1. IEEE Floating Point (IEEE 754)

二进制表示方法:k=jibk·2k

计算机中的浮点数二进制表示:

先计算出Bias=2(e1)1,其中e为exp域的位数 区分Normalized和Denormalized,算出E=ExpBiasE1BiasE=n/a

规格化浮点数(Normalized)

满足条件exp000...0,exp111...1

  1. 真实的阶码值需要减去一个偏置(biased)量

    image-20220926155957913

  2. frac域的第一位隐含为1(因此可以省去)

非规格化浮点数(Denormalized)

满足条件exp=000...0M=0.xxx...x => E=1Bias,M=0.xxx...x

具体实例

一些特殊值

满足条件exp=111...1 => E=n/a

具体实例

一种“小”浮点数实例

8位浮点数表示:exp域宽度为4 bits,frac域宽度为3 bits,bias=7

image-20220926164227913

推广到单精度与双精度浮点数:

image-20220926164120246

浮点数的编码特性
2. Rounding(舍入)

各种舍入模式:Zero, Round down, Round up, Nearest Even

向偶数舍入(Round-To-Even) 也称“向最接近值的舍入”,其他方式会产生统计误差(statistically biased) 关键的设计决策是确定两个可能结果的中间数值的舍入

image-20220926165256345

“Even”意味着最低有效数字需为0,而最低有效数字右侧的位串为100...

image-20220926165543434

实例:将无符号数转换为8位浮点数并舍入

① 规格化 => ② 舍入 => ③ 调整/Postnormalize(舍入操作可能引起溢出)

image-20221126165212816

image-20221126165245607

image-20221126165428553

3. C语言中的浮点数(类型转换)

int(32位宽),float(1+8+23=32位宽),double(1+11+52=64位宽)

 

 

2 代码

2.1 汇编语法

1. 汇编体系结构

(冯诺伊曼架构 / Control Flow)

image-20220926172708513

寄存器 vs 内存

image-20220926173731869

2. 汇编代码

(web)Compiler Explorer

生成汇编代码的命令行:gcc -Og -S sum.c #-fno-stack-protector 其中:-S生成文件sum.s-c的话生成sum.o(即对象文件,内含机器指令)

反汇编命令:objdump -S/-d a.o, objdump -S/-d a

2.0 操作类型
2.0 通用寄存器

通用寄存器 / General-Purpose Register

note_v2

当参数少于7个时,参数从左到右放入寄存器:rdi, rsi, rdx, rcx, r8, r9当参数为7个及以上时,前6个传送方式不变,但后面的依次“右向左”放入栈中

被调用者保存:汇编程序通常而言,在调用函数前,如果这些寄存器里存放了东西,需要把它们存到内存(栈)里

2.1 基本概念(AT&T)
2.2 常用指令(CSDN)
  1. 运算指令

  2. 拷贝指令

  3. 流程控制指令

  4. 汇编指示符(assembler directive) 编译出的汇编文件的开始处,有一段由句号. 开头的指令,是AT&T 汇编指示符。这里命令名的其余是字母,通常使用小写。这些不会被翻译成机器指令,而是给汇编器一些特殊指示,称为汇编指示(Assembler Directive)伪操作(Pseudo-operation)

    • .byte表达式 可不带参数或者带多个表达式参数,表达式之间由逗号分隔,每个表达式参数都被会变成下一个字节
    • .word表达式 这个表达式表示任意一节中的一个或多个表达式,表达式占一个字(两个字节)
    • .file字符(string) 通告编译器我们准备开启一个新的逻辑文件,string是新文件名。
    • .text小节(subsection) 通知as编译器把后续语句汇编到编号为subsection的正文子段的末尾,subsection是一个纯粹的表达式
    • .code16:告诉编译器生成16位的指令
    • .globl:定义全局符号
    • .fill repeat, size, value
2.3 栈操作

每个汇编代码块的开始都有以下一样的汇编指令:

可以简单的理解为:保存上一个函数的栈底指针,然后重新开始一个新的栈

 

3. 指令操作(上课PPT)
3.1 数据传送指令(mov)

mov指令集movb, movw, movl, movqmovslq(将一个双字节符l拓展移动到四字地址q中),movzwl(读取一个字word然后将其拓展到双字节l

MOVS类指令:将较小的源复制到较大的目的时使用,通过符号扩展来填充 MOVZ类指令:将较小的源复制到较大的目的时使用,通过剩余字节补0来填充

基本语法movq Source, Dest

操作数类型

image-20221010101011642

3.2 寻址模式
  1. 间接寻址

    格式:(R) 寄存器R指定内存地址 内存地址:Mem[Reg[R]] 指令:movq (%rcx), %rax

  2. 基址+偏移量寻址

    格式:D(R) 寄存器R指定内存起始地址,常数D给出偏移量 内存地址:Mem[Reg[R]+D] 指令:movq 8(%rbp), %rdx

  3. 变址模式

    格式:D(Rb,Ri,S) D:常量(地址偏移量) S:比例因子(1, 2, 4 or 8) Rb:基址寄存器,16个通用寄存器之一 Ri:索引寄存器,%rsp不作为索引寄存器

    内存地址:Mem[Reg[Rb]+SReg[Ri]+D] 指令(三种常见变形): (Rb,Ri): Mem[Reg[Rb]+Reg[Ri]] (D(Rb,Ri)): Mem[Reg[Rb]+Reg[Ri]+D] (Rb,Ri,S): Mem[Reg[Rb]+SReg[Ri]]

    [EG: 注意 常数不用加$%rdx == 0xf000, %rcx == 0x100]

    image-20221010102020470

X86-64:默认指针参数类型为64位,(如果使用int数据类型,被操作的数据仍是32位)

image-20221010103208350

image-20221010103229568

3.3 地址计算指令

基本语法leaq Src, Dest 其中,Src内存地址计算表达式,计算出来的地址赋给Dest 使用实例:地址计算(无需访存),[EG]: translation of p = &x[i]; 进行x+k×y这一类型的整数计算,其中k=1,2,4,8 [EG]:long m12(long x) { return x * 12; } ==> leaq(%rdi, %rdi, 2), %rax # t <= x+x*2 salq $2, %rax # return t<<2

3.4 整数计算指令

image-20221010104503748

image-20221010104758875

image-20221010105748168

image-20221010110216442

 

4. X86汇编格式

At&T FormatIntel / Microsoft Format:别名关系

GAS / GNU Assembler:汇编器

X86-64指令的特点

 

2.2 指令结构

1. 控制流(control flow)

处理器:从启动到关闭,CPU读取和执行(解释)一个指令序列,一次一条——这个执行顺序被称为CPU的控制流 当前执行程序的信息: 临时数据:%rax... 栈顶地址:%rsp... 当前指令地址/下一条:%rip...(X86架构下的PC) 条件码:CF, ZF, SF, OF 改变控制流的两种机制:Jump and branch, Call and return

2. 算术指令

addq Src, Dest

CF 进位标志:可用于检测无符号整数运算的溢出 SF set if t < 0 ZF set if t == 0 OF set if 补码运算溢出(即带符号整数运算)

leaq指令不设置条件码

cmpq Src2, Src1 (≈计算Src1-Src2)

CF set if carry out from most significant bit:可用于无符号数的比较 ZF set if a == b SF set if (a-b) < 0 OF set if two's complement overflow:补码计算溢出

Testq Src2, Src1

计算Src1 & Src2并设置相应的条件码,但是不改变目的操作数

ZF set when a&b == 0 SF set when a&b < 0

CF = 0, OF = 0

SetX指令 读取当前的条件码(或者某些条件码的组合),并存入目的字节寄存器(余下的7个字节不会被修改)

image-20221014172843421

[EG/通常使用movzbl指令对目的寄存器进行“0”扩展]

image-20221014173550554

3. 跳转指令

依赖当前的条件码选择下一条执行语句(是否顺序执行)

image-20221014173733884

[EG/编译指令:gcc -Og -S -fno-if-conversion control.c]

image-20221014173825599

C allows goto statement. Jump to position designated by label.

4. 条件移动

语义:if (Test) Dest<=Src

条件跳转指令对于现代流水线处理器的执行效率有很大的负面影响 条件移动指令可以避免这一现象

[EG]

image-20221014174532812

条件转移指令的局限性

  1. Then-Expr或Else-Expr表达式有“副作用”
  2. Then-Expr或Else-Expr表达式的计算量较大
微体系结构背景

处理器流水线(五级流水示例)

Instruction Fetch (IF) Read Registers (RD) Arithmetic Operation (ALU) Memory Access (MEM) Write Back (WB)

现代的通用处理器,支持深度流水线以及多发射结构 条件跳转指令往往会引起一定的性能损失,因此需要尽量消除

5. 循环/分支指令

image-20221018002925919

处理 Fall-Through

image-20221018003444706

跳转表的适用范围:其它情况下会使用平衡二叉树的结构,提升性能

 

2.3 函数与栈帧

基于栈的编程语言

(memory management相关)

  1. 支持递归 [EG] C, Pascal, Java 代码是可重入的(Reentrant)——同时有同一个函数的多个实例在运行 因此需要有一块区域来存储每个函数实例的数据——参数/局部变量/返回地址
  2. 栈的工作规律 每个实例的运行时间是有限的,即栈的有效时间有限——From when called to when return 被调用者先于调用者返回(一般情况下)
  3. 每个实例在栈中维护一个栈帧(stack frame)
1. 调用与返回(Passing control)
2. 传参(Passing data)
3. 栈帧(stack frame)
4. 寄存器使用惯例

通用寄存器分为两类: 调用者负责保存:Caller在调用子函数之前将这些寄存器内容存储在它的栈帧内,调用后恢复 被调用者负责保存:Callee在使用寄存器之前将其原有内容存储在它的栈帧内,退出前恢复

image-20221024163211363

5. 实例:递归

volatile变量:强制使用栈空间(直接存取原始内存地址);但在实际使用中没有修改栈顶寄存器%rsp

通常而言,局部变量一般:① 被优化 ② 寄存器 ③ 运行栈,volatile相当于保证了③

Red Zone: The 128-byte area beyond the location pointed to by %rsp is considered to be reserved.

 

2.4 X86-32的函数调用

编译选项:-m32

  1. 传参直接压栈(对比X86-64会采用寄存器存储)

    image-20221024210356120

  2. 只有8个通用寄存器,caller-callee关系如下:

    image-20221024210451191

思考:“打破”函数调用与返回顺序

image-20221218115730824

image-20221227124423304

 

2.5 复合数据类型的汇编表示

基本数据类型

image-20221031095611683

1. 数组与数组访问

一维数组T A[L] 基本数据类型:T, 数组长度:L 连续存储在大小为L * sizeof(T)字节的空间内

二维数组

  • 嵌套数组T A[R][C] (R行C列) 数组所占据的空间大小:R×C×K bytes (T元素占据K字节) 行优先存储(Row-Major Ordering)

    image-20221031100905505

  • Multi-Level Array / Nested Array 变量univ是一个指针数组,数组长度为3,数组元素长度为8 bytes(int* univ[3] = {a, b, c} 每个指针指向一个整数数组 与嵌套数组的区别: image-20221031101915407

2. 结构与结构访问

结构的内存存储分布(Memory Layout) 连续分配的内存区域,内部各个域通过名字访问 各个域依声明顺序存放 结构的整体大小与各个域的位置布局由编译器决定(汇编层面不了解结构/域等信息)

  • 对齐的一般原则 已知某种基本数据类型的大小为K字节,那么其存储地址必须是K的整数倍(X86-64) 计算机访问内存一般是以内存块为单位的,块的大小是地址对齐的,如4/8/16字节对齐等(如果数据访问地址跨越“块”边界会引起额外的内存访问)
  • 编译器的工作 在结构的各个元素间插入额外空间来满足不同元素的对齐要求

 

2.6 链接

数据、函数、指令的“定位”:

image-20221107170444511

1. 静态链接(Static Linking)

源文件 => 编译生成可重定位对象(Relocatable Object)文件 => 完全链接后产生可执行对象文件(executable object file)

image-20221107112233692

程序链接的作用

  1. 模块化好 多个小的源文件可以链接成一个程序,而不是一个巨大的单一源文件 可以将多个通用函数链接成库文件
  2. 工作效率高 省时间:独立编译 某个原文件被修改后,可以独立编译并重链接,而不需要编译所有文件 省空间:库文件 多个通用函数可以被集成到一个文件中 可执行文件及其运行时的内存镜像(memory image)内只包含有实际使用到的函数
符号解析 / Symbol Resolution

define symbol: void swap() {...} reference symbol: swap(); define symbol and reference: int *xp = &x;

Symbol table is an array of structs Each entry includes name, size and location of symbol

重定位 / Relocation
  1. 将多个文件的数据 / 代码段集成为单一的数据段和代码段
  2. 将.o文件中的符号解析为绝对地址
  3. 然后将所有的符号引用更新为这些新的地址
2. 对象文件

标准二进制格式:Executable and Linkable Format (ELF) 最初由AT&T System V Unix系统采用,后来被广泛采用

Windows uses the Portable Executable (PE) format Mac OS-X uses the Mach-O format Modern x86-64 Linux and Unix systems use Executable and Linkable Format

ELF 文件的格式

readelf main.o -h:显示ELF header信息 readelf main.o -S:显示section headers信息 readelf main.o -e:相当于-h, -l, -S readelf main.o -s:显示.symtab段的信息 readelf main.o -r:显示重定位段信息 readelf main.o -a:显示所有信息

image-20221107160151140

image-20221126201735010

In Section 7.5, we saw how the compiler assigns symbols to COMMON and .bss using a seemingly arbitrary convention. Actually, this convention is due to the fact that in some cases the linker allows multiple modules to define global symbols with the same name. When the compiler is translating some module and encounters a weak global symbol, say, x, it does not know if other modules also define x, and if so, it cannot predict which of the multiple instances of x the linker might choose. So the compiler defers the decision to the linker by assigning x to COMMON. On the other hand, if x is initialized to zero, then it is a strong symbol (and thus must be unique by rule 2), so the compiler can confidently assign it to .bss. Similarly, static symbols are unique by construction, so the compiler can confidently assign them to either .data or .bss.

符号表 / Symbol Tables
  1. 全局符号 / Global symbols, defined by module m e.g. nonstatic C 函数,nonstatic 全局变量
  2. 外部符号 / External symbols, referenced by module m e.g. nonstatic C 函数,nonstatic 全局变量
  3. 局部符号 / Local symbols e.g. static 全局变量(不是local variables) 这部分变量不保存在栈上,编译器在.data.bss分配空间并且在symbol tables里分配独一无二的别名(e.g. 两个函数分别定义static int x = 0,则在汇编器中定义的别名不同)

image-20221231093356357

Symbol tables are built by assemblers, using symbols exported by the compiler into the assembly-language .s file.

image-20221107163137608

全局变量重复定义

Strong: procedures and initialized globals Weak: unintialized globals

Rule.1 Multiple strong symbols with the same name are not allowed. Otherwise: Linker error

Rule.2 Given a strong symbol and multiple weak symbols with the same name, choose the strong symbol.

Rule.3 Given multiple weak symbols with the same name, choose any of the weak symbols. can override this with gcc -fno-common

3. 代码与数据重定位
  1. Relocating sections and symbol definitions merge sections from relocatable object files to executable object files assign run-time memory addresses to the new aggregate sections, to each section defined by the input modules, and to each symbol defined by the input modules
  2. Relocating symbol references within sections modify every symbol reference in the bodies of the code and data sections so that they point to the correct run-time addresses

image-20221107164455664

Relocation Entry

image-20221126204650939

image-20221126204836786

these two relocation types support the x86-64 small code model programs larger than 2GB can be compiled using the -mcmodel=medium/large flags

计算过程

objdump -dx main.o: disassembled code

重定位算法

image-20221126205739331

e.g. 重定位信息(main.o, sum.o

image-20221107171005031

image-20221107171251771

此处,为什么“inter_sum - 0x4"

  1. 机器码e8对应的含义是call
  2. 该行占据的指令数为5bytes,因此执行到改行时,PC = 0x400507 + 0x5 = 0x40050c
  3. 后续的4组数字/4bytes为跳转函数的地址与当前地址的offset。且由于LSB (Least Significant Byte, 低位字节占据高地址),此处实际表示的offset = 0xffffffca
  4. 因此计算得到函数的地址为PC + offset = 0x40050c + 0xffffffca = 0x4004d6
  5. -4:按照400508+?=4004d6计算的结果需要减去4才是正确结果 这里,400508为400507跳过callq指令的机器码
4. 重定位文件

a typical executable file:

  • the ELF header includes the program’s entry point, which is the address of the first instruction to execute when the program runs

  • Segment header table

    image-20221126215037078

    From the program header table, we see that two memory segments will be initialized with the contents of the executable object file. Lines 1 and 2 tell us that the first segment (the code segment) has read/execute permissions, starts at memory address 0x400000, has a total size in memory of 0x69c bytes, and is initialized with the first 0x69c bytes of the executable object file, which includes the ELF header, the program header table, and the .init, .text, and .rodata sections.

    Lines 3 and 4 tell us that the second segment (the data segment) has read/write permissions, starts at memory address 0x600df8, has a total memory size of 0x230 bytes, and is initialized with the 0x228 bytes in the .data section starting at offset 0xdf8 in the object file. The remaining 8 bytes in the segment correspond to .bss data that will be initialized to zero at run time.

  • the .init section defines a small function, called _init, that will be called by the program’s initialization code

  • the .text, .rodata and .data sections are similar to those in a relocatable object file, except that these sections have been relocated to their eventual run-time memory addresses

  • since the executable is fully linked (relocated), it needs no .rel sections

For any segment, the linker must choose a starting address, vaddr, such that vaddr mod align = off mod align where off is the offset of the segment’s first section in the object file, and align is the alignment specified in the program header (221 = 0x200000).

image-20221126214524180

Loading Executable Object Files

Since ./main does not correspond to a built-in shell command, the shell assumes that main is an executable object file, which it runs for us by invoking some memory-resident operating system code known as the loader. Any Linux program can invoke the loader by calling the execve function. The loader copies the code and data in the executable object file from disk into memory and then runs the program by jumping to its first instruction, or entry point. This process of copying the program into memory and then running it is known as loading.

image-20221126230832034

image-20221231103324011

Next, the loader jumps to the program’s entry point, which is always the address of the _start function. This function is defined in the system object file crt1.o and is the same for all C programs. The _start function calls the system startup function, _libc_start main, which is defined in libc.so. It initializes the execution environment, calls the user-level main function, handles its return value, and if necessary returns control to the kernel.

 

image-20221107171446967

5. 静态库文件 vs 共享库文件

static library vs shared library

静态库文件(.a文件)
  1. 将多个相关的重定位对象文件集成为一个单一的带索引的文件 (称为归档文件, archive file)
  2. 增强链接器的功能使之能够在归档文件中解析外部符号
  3. 如果归档文件中的某个成员解析了外部符号,就将其链接入执行文件

image-20221107171649576

静态库连接

image-20221107172005376

共享库文件

A shared library is an object module that, at either run time or load time, can be loaded at an arbitrary memory address and linked with a program in memory. This process is known as dynamic linking and is performed by a program called a dynamic linker. Shared libraries are also referred to as shared objects, and on Linux systems they are indicated by the .so suffix. Microsoft operating systems make heavy use of shared libraries, which they refer to as DLLs (dynamic link libraries)

  • one .so file for a particular library
  • a .text section in memory shared by different running processes
  1. build a shared library libvector.so linux> gcc -shared -fpic -o libvector.so addvec.c multvec.c (-shared: direct the loader to create a shared object file; -fpic: directs the compiler to generate position-independent code)

  2. link to program linux> gcc main2.c ./libvector.so -o prog2l This creates an executable object file prog2l in a form that can be linked with libvector.so at run time.

    None of the code or data sections from libvector.so are actually copied into the executable prog2l at this point. Instead, the linker copies some relocation and symbol table information that will allow references to code and data in libvector.so to be resolved at load time

    prog2l contains a .interp section, which contains the path name of the dynamic linker, which is itself a shared object (e.g., ld-linux.so on Linux systems). The loader loads and runs the dynamic linker

  3. dynamic linker: fully linked executable in memory

    relocate the text and data of libc.so into some memory segment. relocate the text and data of libvector.so into another memory segment. relocate any references in prog2l to symbols defined by libc.so and libvector.so

image-20221126232024735

image-20221218142128201

image-20221218142145179

 

3 内存

3.1 内存布局 memory layout

Linux进程的内存布局(X86-64)

image-20221107100650764

其实,Heap和Data没有一条泾渭分明的分界线

 

3.2 缓冲区溢出 buffer overflow

易受攻击的缓冲区相关代码

image-20221107101421023

image-20221107101536554

image-20221107101655632

image-20221107101745995

1. 代码注入攻击

输入字符串包含可执行代码的字节表示 用缓冲区B的地址覆盖函数返回地址A 当程序执行ret时,将跳转到“注入代码” (绝对地址)

预防手段
  1. avoid overflow vulnerabilities 不要用gets(), puts()这种函数

    使用限制字符串长度的库函数 fgets instead of gets strncpy instead of strcpy Don’t use scanf with %s conversion specification use fgets to read the string or use %ns where n is a suitable integer

  2. employ system-level protections

    随机的stack base地址

    image-20221107102740478

    (处理器支持的)不可执行段 mark region of memory as either "read-only" or "writeable" X86-64 added explicit "execute" permission stack marked as non-executable ——Any attempt to execute this code (exploit code) will fail!

  3. have compiler use "stack canaries"

    Stack Canaries (金丝雀) can help

    Idea Place special value (“canary”) on stack just beyond buffer Check for corruption before exiting function

    GCC Implementation -fstack-protector Now the default (disabled earlier)

    image-20221107103412279

    image-20221107103445647

2. Return-Oriented Programming (ROP) Attacks

Challenges for hackers Stack randomization makes it hard to predict buffer location Marking stack non-executable makes it hard to insert binary code

Alternative Strategy:

image-20221107104433468

3. Supplement: COP / JOP attack

 

补充:IO操作的响应速度与协程

协程:可以主动切出,并在之后可恢复运行的函数运行模式 协程的用途:在CPU进行IO操作时,切换到其它Ready函数的执行现场,并在合适时间点切换回来

线程/进程/协程 - 知乎

image-20221030103900504

image-20221030103543095

冯诺依曼架构所面临的“存储墙”和“功耗墙”——存算分离

image-20221031113854109

 

3.3 虚存

笔记

虚存是一个概念(而非实体) :一个连续的地址空间,进程的数据、代码、运行栈、堆等存于其中

物理地址Physical Addressing 虚拟地址Virtual Addressing (VM)

image-20221119095424807

Advantages:

  1. uses main memory efficiently use DRAM as a cache for the parts of a virtual address space
  2. simplifies memory management each process gets the same uniform linear address space
  3. isolates address spaces one process can't interfere with another's memory user program cannot access privileged kernel information
1. VM用于缓存

Virtual memory is an array of N contiguous bytes stored on disk. The contents of the array on disk are cached in physical memory (DRAM cache) These cache blocks are called pages (size is P=2p bytes)

image-20221119100114074

image-20221119185234409

large page (block) size: typically 4-8 KB, sometimes 4 MB or larger

Page Tables / 页表:an array of page table entries (PTEs) that maps virtual pages to physical pages per-process kernel data structure in DRAM

valid bit+nbit address field "whether cached" + "DRAM offset"

image-20221119101140595

工作原理

  1. Virtual memory works because of locality(数据/指令的局部性)
  2. At any point in time, programs tend to access a set of active virtual pages called the working set (活跃页面的集合称为工作集)
  3. If (working set size < main memory size): Good performance for one process after compulsory misses
  4. If ( SUM(working set sizes) > main memory size ) : (Thrashing) Performance meltdown where pages are swapped (copied) in and out continuously
2. VM 作为内存管理工具

每个进程都有自己的私有虚拟空间

Mapping function scatters addresses through physical memmory 同一块物理地址可以被不同的virtual address映射——节省空间 / sharing code and data among processes

image-20221119103214932

3. VM 作为内存保护工具

Extend PTEs with permission bits (PTE: page table entry) Page fault handler checks these before remapping If violated, send process SIGSEGV (segmentation fault)

不同的权限分配:

image-20221119103452594

violating consequence: the kernel exception handler sends a SIGSEGV signal to the offending process (Linux reports this as a segmentation fault)

 

3.4 虚存地址翻译/转换

Notation

PTE: Page table entry (value, *PTEA), 存在内存里 PTEA: Page table entry address (key, 地址的概念), 内存地址 PTBR: Page table base register TLB(): Translation lookaside buffer

  1. Basic Parameters

    N=2n: Number of addresses in VAS (virtual address space), Must be a power of 2 M=2m: Number of addresses in PAS (physical address space), Need not be a power of 2 P=2p: Page size (bytes) T=2t: the sets that TLB has

  2. Components of VA

    VPO: Virtual page offset (bytes) VPN: Virtual page number TLBL: TLB index TLBT: TLB tag

  3. Components of PA

    PPO: Physical page offset (same as VPO) PPN: Physical page number CO: Byte offset within cache block CI: Cache index CT: Cache tag

image-20221119200003829

1. Page Hit

page hit is handled entirely by hardware

image-20221119200619660

2. Page Fault

page fault requires cooperation between hardware and the operating system kernel

image-20221119200628969

image-20221119202739093

3. TLB (Translation Lookaside Buffer)

Small hardware cache in MMU Contains complete page table entries for small number of pages Maps virtual page numbers to physical page numbers

[EG]:

image-20221119203812938

4. Multilevel page table
5. Cache

image-20221119204503080

 

3.5 内存映射 memory mapping

Linux Virtual Memory System

  1. The virtual memory of a Linux process

    image-20221119212152001

  2. Linux Page Fault Handling

    image-20221120094325150

Memory Mapping

Area can be backed by...

image-20221230161336351

image-20221120094636673

image-20221218155346218

Shared Objects

image-20221227155741334

execve

image-20221231113945598

 

3.6 内存分配

用户层的动态内存分配

image-20221121095416891

Performance Goal
碎片化问题

Poor memory utilization caused by fragmentation

1. 内部碎片化 internal fragmentation

payload is smaller than block size

image-20221121100624024

2. 外部碎片化 external fragmentation

image-20221121100655450

3. header field / header

Thus we know how many memory to free given just a pointer.

image-20221121100757352

image-20221121100913560

4. Implicit List 实现
1) Finding a Free Block

image-20221121153724584

First Fit

Next Fit: search list starting where previous search finished Best Fit: choose the best free block (fits, with fewest bytes left over)

2) Splitting

image-20221121154851363

3) Freeing a Block

Join with next/previous blocks:

image-20221121155300384

Bidirectional Coalescing / 双向

Boundary Tags [Knuth73]

image-20221121155753220

image-20221121155854423

然而Boundary Tages会产生内部碎片。

4) Other Policies

 

4 异常控制流 ECF

Code Example

异常:各个任务切换的重要机制

Changing Control Flow:

Exceptions
  1. Exceptions: transfer of control to the OS in response to some event (从用户态到核心态 user => kernel)

    Examples: div by 0, arithmetic overflow, page fault, I/O request completes, Ctrl-C...

    processor pushses additional processor state onto the stack necessary to restart the interrupted program when the handler returns

    Control transferred from user program to the kernel: items pushed on the kernel's stack rather than onto the user's stack

    image-20221121161723037

  2. Interupt Vectors(中断向量/异常处理向量)

    each type of event has unique exception number k handler k is called each time exception k occurs

    exception table: used to form the address of the appropriate exception handler exception table base register: contains the starting address of the exception table

    image-20221121162524285

    image-20221129093110833

    image-20221129094323386

  3. Asynchronous Exceptions(异步异常/中断/Interrputs)

    Caused by events external to the processor(外设触发) indicated by setting the processor's interrupt pin handler returns to "next" instruction

    image-20221129093225264

    Examples:

    • I/O interrputs hitting Ctrl-C at the keyboard arrival of a packet from a network arrival of data from a disk
    • Hard reset interrupt hitting the reset button
    • Soft rest interrupt hitting Ctrl-Alt-Delete on a PC
  4. Synchronous Exceptions(同步异常)

    Caused by events that occur as a result of executing an instruction(由指令执行引起). This instruction is referred as the faulting instruction

    • Traps Intentional(程序故意引起的): system calls, breakpoint traps, special instructions Returns control to "next" instruction

      reading a file (read) creating a new process (fork) loading a new program (execve) terminating the current process (exit)

      image-20221121163434436

    • Faults Unintentional but possibly recoverable(程序无意引起的): page faults (recoverable), protection faults (unrecoverable), floating point excetions Either re-executes faulting ("current") instruction or aborts

      image-20221121163615730

      image-20221121163718871

    • Aborts Unintentional and unrecoverable: parity error, machine check Aborts current program

syscall

System calls are provided on x86-64 systems via a trapping instruction called syscall. It is quite interesting to study how programs can use this instruction to invoke Linux system calls directly. All arguments to Linux system calls are passed through general-purpose registers rather than the stack. By convention, register %rax contains the syscall number, with up to six arguments in %rdi, %rsi, %rdx, %r10, %r8, and %r9. The first argument is in %rdi, the second in %rsi, and so on. On return from the system call, registers %rcx and %r11 are destroyed, and %rax contains the return value. A negative return value between −4,095 and −1 indicates an error corresponding to negative errno.

image-20221231141206532

 

4.1 Processes(进程)

1. 基本概念

A process is an instance of a running program(程序的一个运行实例)

Process provides each program with two key abstractions:

maintained by... 多进程交错运行、快速切换 and 由虚拟内存系统管理地址空间

image-20221121165540677

  1. Save current registers in memory

  2. Schedule next process for execution

  3. Load saved registers and switch address space (context switch)

    (with one CPU core)

    image-20221121165716246

2. 并发进程 Concurrent Processes

Two processes run concurrently if they flows overlap in time ——Otherwise, they are sequential

3. 进程切换 Context Switching

Processes are managed by a shared chunk of OS code called the kernel [Important]: The kernel is NOT a separate process, but rather runs as part of some user process

image-20221121170151847

4. fork: 创建新进程

creates a new process (child process) that is identical to the calling process (parent process) return 0 to the child process return child's pid to the parent process

fork is called once but returns twice

image-20221121171139051

image-20221121171333528

  1. create VM for new process create exact copies of current mm_struct, vm_area_struct and page tables flag each page in both processes as read-only flag each vm_area_struct in both processes as private COW

  2. fork的机制

  3. exit:终止当前进程 atexit(): registers functions to be executed upon exit

5. Zombies:僵尸进程

子进程比父进程先结束,进程终止后仍然会消耗系统资源:various tables maintained by OS

Reaping / 回收:父进程对终止的子进程进行回收(父进程获取子进程的退出状态信息),内核释放子进程的剩余资源

如果父进程没有回收子进程,那么子进程会被init process回收

status=256×exit_value_of_the_child

6. execve

loading and running programs in the context of the current process

execve is called once and never return The execve function loads and runs the executable object file filename with the argument list argv and the environment variable list envp (name = value strings). Execve returns to the calling program only if there is an error, such as not being able to find filename.

image-20221130145308753

image-20221130145641367

 

4.2 壳程序

shell: an interactive application-level program that runs other programs on behalf of the user.(“提供用户使用界面”的程序)

the original shell was the sh program, followed by csh, tcsh, ksh and bash perform a sequence of read/evaluate steps and then terminates

main routine:

image-20221130151641769

image-20221130151930171

BASH ps 命令

ps -l

image-20221205154634917

F:进程的权限标志 S:进程的状态(stat),主要状态有:R (running), S (sleep), T (stop), D (不可被唤醒), Z (zombie) UID / PID / PPID:此进程被UID所拥有 / 进程的PID号 / 此进程的父进程的PID PRI / NI:priority/nice的缩写,表示该进程被CPU所执行的优先顺序(越小越优先) ADDR / SZ / WCHAN:该进程在内存的部分 / 此进程用掉多少内存 / 目前进程是否在工作 TTY:登入这的终端机位置,若为远程登入则使用动态终端介面(pts/n) TIME:使用掉的CPU时间 CMD:指令

ps -t

image-20221205155325305

jobs 查看挂起的进程信息

前 / 后台程序

background job: [command] & 不需要等待进程结束

前台程序 correctly waits for and reaps foreground jobs 但是,后台程序终止时会变成“僵尸”(因为可能永远不会被回收) 这会导致内存泄漏,导致OS内核资源不足(一旦超过了进程配额 process quota),shell就不能运行任何新命令,即fork()返回-1

当一个后台进程完成时,OS内核将采取一种类似“异常”(ECF)的机制来“提醒“ shell。这一机制被称为信号 signal

 

4.3 信号处理 Signals

1. 信号

Definition: A signal is a small message that notifies a process that an event of some type has occurred in the system

类似于“异常” 其类型由小整数ID表示(1-30)kill -S -id 由内核(有时在另一个进程的请求下)发送给进程 传递的唯一的信息是其自身(即ID和它到达这一事实)

image-20221130163825180

Default Action 还包括:暂停(直至恢复)

image-20221208201235432

2. Installing Signal Handlers
  1. Signals Handlers as Concurrent Flows

    A signal handler is a separate logical flow (not process) that runs concurrently with the main program

    具有独立的stack与处理器上下文(CPU context / control flow) 内存是共享的

    image-20221130202205349

  2. pending signals are not queued 因信号无法缓存,多个同类信号同时到达的话会丢失若干

  3. living with nonqueuing Signals: must check for all terminate jobs 检查所有的终止子进程:wait

  4. signal arrival during long system calls (say a read) 系统调用执行过程中信号到达了,signal handler会中断read call,signal handler结束后read call会继续自动进行

  5. signal handler reacts to externally generated events

    Function is async-signal-safe if either reentrant(仅有局部变量) or non-interruptible by signals 即当前的函数不会被signal所屏蔽掉 async-signal-safe wrapper for printf: safe_print

    image-20221130204109789

 

4.4 非本地跳转 Nonlocal jumps

1. setjmp & longjmp

Powerful (but dangerous) user-level mechanism for transferring control to an arbitrary location Controlled to way to break the procedure call / return discipline Useful for error recovery and signal handling

efficient in deeply nested functions

image-20221130205349411

image-20221227124423304

2. Limitations of Nonlocal Jumps

Work with stack discipline: can only long jump to environment of function that has been called but not yet completed

image-20221130205824454

image-20221205100051480

sigsetjmp() and siglongjmp() are versions of setjmp() and longjmp() that can be used by signal handlers

小结

 

4.5 ECF小结

  1. fork()

  2. sleep()

  3. pause()

  4. execve()

  5. kill()

  6. signal()

  7. jump()

     

 

5 IO处理

5.1 Unix I/O

1. Unix Files

Definition:

A Unix file is a sequence of m bytes All I/O devices are represented as files(IO设备都视作文件) /dev/sda2 (/usr disk partition) /dev/tty2 (terminal) Even the kernel is represented as a file /dev/kmem (kernel memory image) /proc (kernel data structures)

vim limits vim maps

Opening Files

image-20221205173329951

Return a small identifying integer file descriptor(文件描述符) the descriptor returned is always the smallest descriptor that is not currently open in the process fd == -1:an error occurred

Each process created by a Unix shell begins life with three open files associated with a terminal (files are recorded in a FD_TABLE) 0: standard input 1: standard output 2: standard error (diagnostic) output

Closing Files

Closing an already closed file is an error Moral: always check return codes, even for seemingly benign functions such as close()

Reading Files

读取文件内容并更新文件访问位置

returns number of bytes read from file fd into buf

Return type ssize_t is signed integer nbytes < 0 indicates that an error occurred Short counts (nbytes < sizeof(buf)) are possible and are not errors

Writing Files

写入文件内容并更新文件访问位置

returns number of bytes written from buf to file fd

nbytes < 0 indicates that an error occurred As with reads, short counts are possible and are not errors

dealing with Short Counts

image-20221205110816003

 

5.2 RIO (robust I/O) package

 

5.3 Metadata, sharing and redirection

1. File Metadata(元数据)

Definition: metadata is data about data, in this case file data

Per-file metadata maintained by kernel Accessed by users with the stat and fstat functions

image-20221205111325879

2. Accessing File Metadata

image-20221205111741459

3. Unix Kernel Representation of Open Files

image-20221205112217955

File sharing:

image-20221205113339088

After call to fork:

image-20221205180755067

4. I/O Redirection(重定向)

Example: call dup(4, 1)

image-20221205185447548

Remarks: 此时fd1和fd4连续调用read(fd, &buffer, 1)得到的&buffer将不同(因为其中一个调用完read之后file pos的值也改变了)

Remarks:几种可能的Descriptor table和Open file table的关系

  • 一个进程的多个descriptor table entry一一对应Open file table中的不同项 这是最一般的情况(多次打开同一个文件也是如此)
  • 不同进程的同一descriptor table entry对应Open file table中的同一项 子进程继承父进程
  • 同一进程的不同descriptor table entry对应Open file table中的同一项 dup / dup2

 

5.4 Standard I/O

The C standard library (libc.so) contains a collection of higher-level standard I/O functions

Opening and closing files: fopen, fclose Reading and writing bytes: fread, fwrite Reading and writing text lines: fgets, fputs Formatted reading and writing: fscanf, fprintf

特点:流式文件处理(a stream is a pointer to a structure of type FILE) ANSI C program begins with three open streams: stdin, stdout, stderr

image-20221205195523112

  1. 通过fflush()调用显示地flush到输出文件 / 检测到换行符输出

  2. strace ./main显示进程的系统调用信息

  3. 不能用在二进制文件上的函数: 基于行处理的IO:fgets, scanf, printf, rio_readlineb

    不同系统解释换行符的方式不同0x0a (\n): Linux和Mac OS X:LF(0x0a), [\n] HTTP服务器 & Windows:CR+LF(0x0d 0x0a), [\r\n] 使用rio_readnrio_readnb替代

  4. 字符串处理函数:strlen, strcpy——解释字节0为串结尾

Pros and Cons: Unix I/O
Pros and Cons: Standard I/O

 

 

6 线程与线程同步

6.1 线程基本概念

1. 进程与线程

进程:Process = process context + code, data and stack 进程上下文(含处理器状态 以及 内核上下文)与虚存空间

image-20221212101247339

线程:Process = thead + code, data and kernel context

线程thread 和 进程progress 的区别:有独立的处理器状态,和主进程共享内核上下文 线程thread 和 协程coroutine 的区别:前者OS调用,后者为用户调用thread

image-20221212101310412

  1. Process with Two Threads

image-20221212101722643

每个线程有自己的处理器状态:CPU context 每个线程有自己的线程编号:TID

  1. Threads vs Processes

image-20221212101942141

并发(concurrency):一个处理器同时处理多个任务(逻辑上的同时发生,依赖于轮询) 并行(parallel):多个处理器或者多核处理器同时处理多个不同的任务(物理上的同时发生)

image-20221212102503785

image-20221212102356684

2. Pthreads Interface

Pthreads: Standard interface for ~60 functions that manipulate threads from C programs

3. Pros and Cons of Thread-Based Designs

Pros: 易于在线程之间共享数据;线程比进程更有效率 Cons:无意的共享可能会导致微妙的、难以重现的错误

 

6.2 共享(sharing)

Definition:一个变量x是共享的,当且仅当多个线程引用x的某个实例

1. Threads Memory Model

概念模型

操作层面

2. Mapping Variable Instances to Memory

全局变量: 定义在所有函数外面的变量 虚拟内存包含全局变量的唯一实例 局部变量:定义在函数内部的变量,无static属性 每个线程栈包含每个局部变量的一个实例 局部静态变量:定义在函数内部,有static属性 虚拟内存包含局部静态变量的唯一实例

 

6.3 互斥(mutual exclusion)

1. 进程图 progress graph

We can analyze the behavior using a progress graph 进度图 A trajectory is a sequence of legal state transitions that describes one possible concurrent execution of threads. 轨迹描述了状态转换的顺序,它代表一种可能的线程并发执行方式。 critical section 临界区:w.r.t the shared variable cnt

image-20221212141551215

2. 同步 synchronize

We must synchronize 同步 the execution of the threads so that they never have an unsafe trajectory (i.e., need to guarantee mutually exclusive access to critical region)

经典方法Semaphores 信息量 (Edsger Dijkstra)

Other Approaches: Mutex and condition variables (Pthreads) Monitors (Java)

 

6.4 信息量 Semaphores

Semaphore:非负整数的,用于同步的全局变量 由P、V操作组成:P(s): [ while (s == 0) wait(); s--; ], V(s): [ s++; ] OS内核确保[]之间的操作不可分割:一次只有一个P或者V操作修改s;只有P能减小s Semaphore invariant: (s >= 0)

Pthreads functions:

1. Using Semaphores for Mutual Exclusion

Basic idea:

Terminology

2. goodcnt.c: 正确同步的例子

However, it's much slower than badcnt.c

image-20221212154631028

3. 共享资源

基本思想:线程使用信号量来通知另一个线程某些条件为真 使用计数信号量来跟踪资源使用状态,并通知其他线程 使用互斥信号量保护对共享资源的访问

典型的例子:生产者 / 消费者问题 The Producer-Consumer Problem

image-20221212155715258

N个元素的生产者消费者问题:需要一个互斥信号量和两个计数信号量 mutex:对缓冲区的互斥访问 slots:计算缓冲区中可用的槽位 items:计算缓冲区中可用的条目 slots + items = n

  1. 结构体实现

    • 补充:理发师理发问题

      image-20221219153433049

  2. 竞争:程序的正确性依赖于某种顺序

    发生race的示例

    主线程对i的改变与子线程对vargp的引用之间发生竞争

    race的消除

4. 死锁

定义:等待一个永远不会发生的条件 经典场景:进程1和进程2需要两个资源A和B以继续,进程1获得了A等待B,进程2获得了B等待A

image-20221212163258401

image-20221212163409713

避免死锁:使用同样的顺序加锁

image-20221212163503164

image-20221212163522888

 

7 X86 汇编编程

7.1 Hello World

基本概念

gcc -S -Og helloworld.c

  1. 汇编指示 Directives 汇编代码中以.开头的行都是汇编指示,如.file.def.text等,用以直到汇编器如何进行汇编。其中.file.def.CFI等均用于调试(可以忽略) 示例:.globl main表示汇编器符号main是全局的,.LC0则不是全局可见的
  2. 符号 Symbol :结尾的字符串(如main:)是用以表示变量或者函数的地址的符号
  3. .text.section .text表示代码段 .data.section .rodata表示(只读)数据段
  4. 按16字节对齐:.p2align 4,,15指定下一行的对齐方式第一个参数表示按2的4次幂字节对齐,第二个参数表示对齐时额外空间用什么数据来填充,第三个参数表示最多允许额外填充多少字节

int c = 1;:初始化全局变量

编译命令

as my_file.s -o my_object_file.o:加入-gstabs选项产生带调试信息的object文件 ld my_object_file.o -o my_exe_file:可以有多个.o文件;加入-g选项产生调试信息

"Hello World.c"

1. 系统调用

X84-64 Linux下的系统调用是通过syscall指令(一种异常机制)实现的 %rax存放系统调用的功能号 传给系统调用的参数必须按顺序存放到寄存器%rdi, %rsi, %rdx, %r10, %r8, r9 当系统调用完成之后,返回值可以再寄存器%rax中获得(一般<0表示错误)

image-20221219161741374

image-20221219161809663

image-20221219161830176

2. 处理命令行参数

相当于C语言形式int main(int argv, char *argv[]) (*) 当一个可执行程序通过命令行启动时,命令行参数将被保存到

gdb调试

image-20221219165604342

3. 调用lib_c库函数

汇编命令:as hello.s -o hello.o ld -lc -dynamic-linker /lib64/ld-linux-x86-64.so.2 hello.o -o hello

 

7.2 汇编小结

  1. 三个常用的段

    .data:数据段(声明带有初始值的数据) .bss:数据段(声明无需初始化的数据) .text:正文段(程序指令)

  2. 程序入口地址:_start符号表示默认的起始点 如想汇编内部的符号能够被外部模块访问,需赋予.globl属性,如.globl _start

1. data段

类型说明

image-20221219182729738

2. bss段

data段不同, 无需声明特定的数据类型, 只需声明为所需目的保留的一定长度的内存即可 .comm声明为未初始化的全局内存区域 .lcomm 声明为未初始化的局部内存区域

image-20221219183504899

image-20221219184034668

image-20221219184056271

 

7.3 汇编示例

1. 阶乘(factorial.s

image-20221219185018399

2. C语言调用汇编
3. 文件处理示例