计算机体系结构 (Computer Architecture)(狭义):系统程序员所看到的计算机的属性,即计算机的逻辑结构和功能特征,包括其各个硬部件和软部件之间的相互关系。(example: IBM 360)
冯·诺依曼架构 (Von Neumann):① 采用二进制逻辑,② 程序存储执行, ③ 五部分组成(Input / Output Device, CPU (Central Processing Unit) = Control Unit + ALU (Arithmetic / Logic Unit) + registers, Memory Unit)
计算机组成 (Computer Organization):计算机系统的逻辑实现,与“计算机体系结构”并称为广义概念上的计算机体系结构
计算机实现 (Computer Implementation):计算机系统的物理实现
bit (比特):1个二进制位
Byte (字节):8个二进制位,
存储单位换算(kilo, Mega, Giga, Tera, Peta, Exa)
机器字 (Machine Word):计算机进行一次整数运算所能处理的二进制数据的位数,通常也指数据地址长度
32位字长:地址的表示空间是4GB
64位字长:地址的表示空间约为
机器字在内存中的组织:地址按照字节 (byte)来定位
机器字中的第一个字节的地址
相邻机器字的地址相差8
字节序 (Byte Ordering) Big Endian:低位字节 (Least significant byte, LSB) 占据高地址 Little Endian:高位字节 (Most significant byte, MSB) 占据高地址✓
C语言中基本数据类型的大小 (in Bytes)
重点记忆:
补码加法公式:
C语言中的无符号数与带符号数
U
作为后缀表示无符号数无符号数加法
补码加法
1
或0
),但是发生舍入错误(向下取整而非向零:-69.5 => -70)
为了实现向零取整:Compute as
二进制表示方法:
float
类型可以精确到小数点后6位,double
类型可以精确到小数点后15位计算机中的浮点数二进制表示:
数字形式:
浮点数的类型: 规格化浮点数(Normalized),非规格化浮点数(Denormalized),一些特殊值
先计算出
,其中 为exp域的位数 区分Normalized和Denormalized,算出 或 或
满足条件:
真实的阶码值需要减去一个偏置(biased)量
frac域的第一位隐含为1(因此可以省去)
满足条件:
具体实例:
满足条件:
具体实例:
8位浮点数表示:exp域宽度为4 bits,frac域宽度为3 bits,bias=7
推广到单精度与双精度浮点数:
(几乎)可以直接使用无符号整数的比较方式 【反例】:必须先比较符号位 考虑+0,-0的特例 考虑NaN的问题(不考虑符号位的话,NaN比其他值都大,实际比较结果如何?——undefined behavior) 其他情况都可以直接使用无符号整数的比较方式:规格化 vs 非规格化,规格化 vs 无穷
各种舍入模式:Zero, Round down, Round up, Nearest Even
向偶数舍入(Round-To-Even) 也称“向最接近值的舍入”,其他方式会产生统计误差(statistically biased) 关键的设计决策是确定两个可能结果的中间数值的舍入
“Even”意味着最低有效数字需为0,而最低有效数字右侧的位串为100...
① 规格化 => ② 舍入 => ③ 调整/Postnormalize(舍入操作可能引起溢出)
int(32位宽),float(1+8+23=32位宽),double(1+11+52=64位宽)
double/float
可以表示的最大值要比int
能表示的最大值要大,此时发生溢出)或者浮点数是NaN,则转换结果没有定义:通常置为float
数值变大它会变“稀疏”)
(冯诺伊曼架构 / Control Flow)
寄存器 vs 内存:
生成汇编代码的命令行:gcc -Og -S sum.c #-fno-stack-protector
其中:-S
生成文件sum.s
,-c
的话生成sum.o
(即对象文件,内含机器指令)
xg++ -E main.cpp -o main.ii //Preprocess:预处理命令(处理#include和宏)
g++ -S main.ii -o main.s //Compile:编译为汇编代码
g++ -c main.s -o main.o //Assemble:编译成指令并保存在对象文件
//这三个合称“编译期”
g++ main.o -o main //Link:将对象文件和标准库链接为二进制可执行文件,“链接期”
//其他编译选项:
g++ test.cpp --std=c++11 -fno-elide-constructors -o test
//禁止编译器进行返回值优化
g++ test.cpp -g -o test -fsanitize=undefined,address
//查看非法访问、溢出、内存外泄的强大命令
//其它:
gcc编译C代码,g++编译C++代码
反汇编命令:objdump -S/-d a.o
, objdump -S/-d a
assembly_tutorial
通用寄存器 / General-Purpose Register
当参数少于7个时,参数从左到右放入寄存器:rdi, rsi, rdx, rcx, r8, r9。当参数为7个及以上时,前6个传送方式不变,但后面的依次“右向左”放入栈中
被调用者保存:汇编程序通常而言,在调用函数前,如果这些寄存器里存放了东西,需要把它们存到内存(栈)里
数据类型
数据格式
前缀(AT&T)
寄存器前被冠以%
立即数前被冠以$
十六进制数前被冠以0x
后缀
分别对应的寄存器名称为:al
/ab
, ax
, eax
, rax
[EG]:movl
表示操作32bit寄存器,movb
表示操作8bit寄存器
操作数方向
源操作在前,目的操作数在后
[EG]:movl $0x42, %ah
:把立即数$0x42
放到寄存器%ah
中
对于内存单元操作数来说,在AT&T中是把寄存器用()
括起来
[EG]:movl %ebx, 8(%si)
:将%ebx
寄存器里的值放到内存地址是8(%si)
的内存单元上,相当于intel汇编中的[si + 8]
运算指令
xxxxxxxxxx
addl %edx, %eax //edx+eax,结果放到eax
add $5, %r10 //5+r10,结果放到r10
inc %r10 //r10加1
mul %r10 //将rax乘以r10,将结果放到rax中,溢出部分放到rdx
div %r10 //rax除以r10,商放到rax,余数放到rdx
拷贝指令
xxxxxxxxxx
mov %r10,%r11 //将r10寄存器的值赋值给r11 ;
mov $99,%r10 //将立即数99赋值给r10寄存器;
mov %r10,(%r11) //将r10的值拷贝到r11寄存器中的数值指向的内存地址上;
mov (%r10),%r11 //将r10中数值指向的内存地址上的内容拷贝到r11;
push %r10 //将r10的值放到栈上 ;
pop %r10 //将栈顶的值pop到r10寄存器上。
流程控制指令
xxxxxxxxxx
cmp %r10,%r11 //比较r10和r11,根据比较结果来设置CPU的状态寄存器,从而影响后面的jump语句;
cmp $99,%r11 //比较99和r11,根据比较结果来设置CPU的状态寄存器,从而影响后面的jump语句;
jmp label //跳转到label
je label //如果相等,跳转到label
jne label //如果不相等,跳转到label
jl label //如果小于,跳转到label
jg label //如果大于,跳转到label
call label //调用函数
ret //从函数调用返回
syscall //系统调用 (32位模式下, 使用"int $0x80" 软中断)
汇编指示符(assembler directive)
编译出的汇编文件的开始处,有一段由句号.
开头的指令,是AT&T 汇编指示符。这里命令名的其余是字母,通常使用小写。这些不会被翻译成机器指令,而是给汇编器一些特殊指示,称为汇编指示(Assembler Directive)或伪操作(Pseudo-operation)
.byte
表达式
可不带参数或者带多个表达式参数,表达式之间由逗号分隔,每个表达式参数都被会变成下一个字节.word
表达式
这个表达式表示任意一节中的一个或多个表达式,表达式占一个字(两个字节).file
字符(string)
通告编译器我们准备开启一个新的逻辑文件,string是新文件名。.text
小节(subsection)
通知as编译器把后续语句汇编到编号为subsection的正文子段的末尾,subsection是一个纯粹的表达式.code16
:告诉编译器生成16位的指令.globl
:定义全局符号.fill repeat, size, value
每个汇编代码块的开始都有以下一样的汇编指令:
xxxxxxxxxx
pushl %ebp //将调用者的栈底部地址保存起来
movl %esp, %ebp //将调用者的栈顶部地址设置为本栈帧的底部
subq $18, %rsp //为变量申请内存空间,开辟了24字节
可以简单的理解为:保存上一个函数的栈底指针,然后重新开始一个新的栈
mov指令集:movb
, movw
, movl
, movq
,movslq
(将一个双字节符l
拓展移动到四字地址q
中),movzwl
(读取一个字word然后将其拓展到双字节l
)
MOVS类指令:将较小的源复制到较大的目的时使用,通过符号扩展来填充 MOVZ类指令:将较小的源复制到较大的目的时使用,通过剩余字节补0来填充
基本语法:movq Source, Dest
操作数类型
$0x400
, $-533
):$
%
(%rax)
):%
间接寻址
格式:movq (%rcx), %rax
基址+偏移量寻址
格式:movq 8(%rbp), %rdx
变址模式
格式:
内存地址:
[EG: 注意 常数不用加$
,%rdx == 0xf000, %rcx == 0x100]
X86-64:默认指针参数类型为64位,(如果使用int
数据类型,被操作的数据仍是32位)
基本语法:leaq Src, Dest
其中,translation of p = &x[i];
进行long m12(long x) { return x * 12; }
==>
leaq(%rdi, %rdi, 2), %rax # t <= x+x*2
salq $2, %rax # return t<<2
t3
]
At&T Format 和 Intel / Microsoft Format:别名关系
GAS / GNU Assembler:汇编器
leal
/leaq
指令)
处理器:从启动到关闭,CPU读取和执行(解释)一个指令序列,一次一条——这个执行顺序被称为CPU的控制流 当前执行程序的信息: 临时数据:%rax... 栈顶地址:%rsp... 当前指令地址/下一条:%rip...(X86架构下的PC) 条件码:CF, ZF, SF, OF 改变控制流的两种机制:Jump and branch, Call and return
条件码(由算术指令隐式设置)
CF: Carry Flag(进位) ZF: Zero Flag SF: Sign Flag (negative if true) OF: Overflow Flag
addq Src, Dest
CF 进位标志:可用于检测无符号整数运算的溢出 SF set if t < 0 ZF set if t == 0 OF set if 补码运算溢出(即带符号整数运算)
leaq指令不设置条件码
cmpq Src2, Src1 (≈计算Src1-Src2)
CF set if carry out from most significant bit:可用于无符号数的比较 ZF set if a == b SF set if (a-b) < 0 OF set if two's complement overflow:补码计算溢出
Testq Src2, Src1
计算Src1 & Src2并设置相应的条件码,但是不改变目的操作数
ZF set when a&b == 0 SF set when a&b < 0
CF = 0, OF = 0
SetX指令 读取当前的条件码(或者某些条件码的组合),并存入目的字节寄存器(余下的7个字节不会被修改)
[EG/通常使用movzbl
指令对目的寄存器进行“0”扩展]
依赖当前的条件码选择下一条执行语句(是否顺序执行)
[EG/编译指令:gcc -Og -S -fno-if-conversion control.c
]
C allows goto statement. Jump to position designated by label.
xxxxxxxxxx
long absdiff_j(long x, long y) {
long result;
int ntest = (x <= y);
if (ntest) goto Else;
result = x - y;
goto Done;
Else:
result = y - x;
Done:
return result;
}
语义:
条件跳转指令对于现代流水线处理器的执行效率有很大的负面影响 条件移动指令可以避免这一现象
[EG]
条件转移指令的局限性
- Then-Expr或Else-Expr表达式有“副作用”
- Then-Expr或Else-Expr表达式的计算量较大
处理器流水线(五级流水示例):
Instruction Fetch (IF) Read Registers (RD) Arithmetic Operation (ALU) Memory Access (MEM) Write Back (WB)
现代的通用处理器,支持深度流水线以及多发射结构 条件跳转指令往往会引起一定的性能损失,因此需要尽量消除
do while
& while
for
循环
switch
语句:跳转表
Direct:
jmp .L8
/ jump target is denoted by label .L8 Indirect:jmp *.L4(, %rdi, 8)
start of jump table: .L4 must scale by factor of 8 (addresses are by 8 bytes) fetch target from effective Address .L4 + x * 8 (only for) 间接跳转解决方案不一样,需要linker
处理 Fall-Through
跳转表的适用范围:其它情况下会使用平衡二叉树的结构,提升性能
(memory management相关)
函数调用的作用与工作机理
Passing control: 代码调用与返回 Passing data: 函数参数与返回值 Memory management: 调用所需的内存分配/返回时释放
上述机制都是用机器指令实现的
X86-64的程序栈
Push Src
从Src
取得操作数写入栈顶地址( )
popq Dest
读取栈顶数据() 写入 Dest
call label
, 将返回地址压入栈,跳转至labelret
,跳转至栈顶的返回地址存储的内容:局部变量、返回地址、临时空间
栈帧的分配与释放 进入函数后先分配栈帧空间:set-up code 函数返回时释放:finish code 寄存器%rbp指向当前栈帧的起始地址
当前栈帧的内容(x86-64/Linux)
子函数参数(Argument build) 局部变量(因为通用寄存器个数有限) 被保存的寄存器值 父函数的栈帧起始地址(可选)
父函数的栈帧中与当前函数相关的内容
返回地址:由
call
指令存入 当前函数的输入参数
操作特性:一次性分配整个帧;释放简单
- 将
减去某个值(栈帧的大小) - 对于栈帧内容的访问都是基于
完成的 - 可以延迟分配:直接访问不超过当前栈指针(
)128字节的栈上空间 直接加上某个值(栈帧大小)即可释放
通用寄存器分为两类: 调用者负责保存:Caller在调用子函数之前将这些寄存器内容存储在它的栈帧内,调用后恢复 被调用者负责保存:Callee在使用寄存器之前将其原有内容存储在它的栈帧内,退出前恢复
volatile变量:强制使用栈空间(直接存取原始内存地址);但在实际使用中没有修改栈顶寄存器%rsp
通常而言,局部变量一般:① 被优化 ② 寄存器 ③ 运行栈,volatile相当于保证了③
Red Zone: The 128-byte area beyond the location pointed to by %rsp is considered to be reserved.
RIP-relative addressing
[EG] 全局变量的内存地址位置采用相对于
编译选项:-m32
传参直接压栈(对比X86-64会采用寄存器存储)
只有8个通用寄存器,caller-callee关系如下:
一维数组:
T A[L]
基本数据类型:T, 数组长度:L 连续存储在大小为L * sizeof(T)
字节的空间内
二维数组
嵌套数组:
T A[R][C]
(R行C列) 数组所占据的空间大小:(T元素占据K字节) 行优先存储(Row-Major Ordering)
Multi-Level Array / Nested Array 变量univ是一个指针数组,数组长度为3,数组元素长度为8 bytes(
int* univ[3] = {a, b, c}
) 每个指针指向一个整数数组 与嵌套数组的区别:
数组维度大小的确定
固定大小
编译时就知道数组的维度,也被称为静态数组
可变大小
运行时才知道数组的维度,也被称为动态数组
静态数组与动态数组访问优化的对比
结构的内存存储分布(Memory Layout) 连续分配的内存区域,内部各个域通过名字访问 各个域依声明顺序存放 结构的整体大小与各个域的位置布局由编译器决定(汇编层面不了解结构/域等信息)
- 对齐的一般原则 已知某种基本数据类型的大小为
字节,那么其存储地址必须是 的整数倍(X86-64) 计算机访问内存一般是以内存块为单位的,块的大小是地址对齐的,如4/8/16字节对齐等(如果数据访问地址跨越“块”边界会引起额外的内存访问) - 编译器的工作 在结构的各个元素间插入额外空间来满足不同元素的对齐要求
对齐要求
基本数据类型
1 byte
(e.g. char
):N/A
2 bytes
(e.g. short
):地址最后一位为0
4 bytes
(e.g. int
, float
):地址最后二位为0
8 bytes
(e.g. double
, char*
):地址最后三位为0
16 bytes
(e.g. long double
):地址最后四位为0
结构的存储对齐要求 结构起始地址的对齐要求等同于该结构各个元素中对齐要求最高的那个 结构中元素的对齐要求必须满足该元素自身的对齐要求 结构的长度必须是该结构各个元素中对齐要求最高的那个元素长度的整数倍
example:
struct S1 {char c; int i[2]; double v;}
——[K =8]S1如果是一个数组,那么数组中的每个元素(结构体)的地址是8的整数倍 i必须是4对齐
数组的每个元素在结构中的相对地址在编译时就已确定
结构数组:整体结构长度乘以数组长度,满足每个元素的对齐要求
访问结构数组中的元素: 首先计算数组元素(即结构)的地址 然后访问该结构中的原元素
结构内元素不同,字节浪费情况也不相同
union / 联合
union中可以定义多个成员,union的大小由最大的成员的大小决定 union成员共享同一块大小的内存,一次只能使用其中的一个成员
数据、函数、指令的“定位”:
xxxxxxxxxx
g++ -E main.cpp -o main.ii //Preprocess:预处理命令(处理#include和宏)
g++ -S main.ii -o main.s //Compile:编译为汇编代码
g++ -c main.s -o main.o //Assemble:编译成指令并保存在对象文件
//这三个合称“编译期”
g++ main.o -o main //Link:将对象文件和标准库链接为二进制可执行文件,“链接期”
源文件 => 编译生成可重定位对象(Relocatable Object)文件 => 完全链接后产生可执行对象文件(executable object file)
程序链接的作用:
define symbol:
void swap() {...}
reference symbol:swap();
define symbol and reference:int *xp = &x;
Symbol table is an array of structs Each entry includes name, size and location of symbol
.o
文件)/ Relocatable object file
含有一定格式的代码与数据内容,可以与其它重定向对象文件一起集成为执行文件
一个.o
文件由唯一的一个源文件生成.out
文件)/ Executable object file
含有一定格式的代码与数据内容,可以直接被装载入内存并执行.so
文件)/ Shared object file
特殊类型的重定向对象文件,可以被装载入内存后进行动态链接;链接可以在装载时或者运行时完成
Windows系统下被称为DLL
文件标准二进制格式:Executable and Linkable Format (ELF) 最初由AT&T System V Unix系统采用,后来被广泛采用
Windows uses the Portable Executable (PE) format Mac OS-X uses the Mach-O format Modern x86-64 Linux and Unix systems use Executable and Linkable Format
readelf main.o -h
:显示ELF header信息
readelf main.o -S
:显示section headers信息
readelf main.o -e
:相当于-h, -l, -S
readelf main.o -s
:显示.symtab
段的信息
readelf main.o -r
:显示重定位段信息
readelf main.o -a
:显示所有信息
In Section 7.5, we saw how the compiler assigns symbols to COMMON and .bss using a seemingly arbitrary convention. Actually, this convention is due to the fact that in some cases the linker allows multiple modules to define global symbols with the same name. When the compiler is translating some module and encounters a weak global symbol, say, x, it does not know if other modules also define x, and if so, it cannot predict which of the multiple instances of x the linker might choose. So the compiler defers the decision to the linker by assigning x to COMMON. On the other hand, if x is initialized to zero, then it is a strong symbol (and thus must be unique by rule 2), so the compiler can confidently assign it to .bss. Similarly, static symbols are unique by construction, so the compiler can confidently assign them to either .data or .bss.
ELF header
Word size, byte ordering, file type (.o
, exec
, .so
), machine type, etc.
begin with: a 16-byte sequence that describes the word size and byte ordering of the system that generated the file
rest: information that allows a linker to parse and interpret the object file (size of the ELF header, object file type, machine type, file offset of the section header table, size of various sections described by the section header table)
Segment header table Page size, virtual addresses memory segments (sections), segment sizes.
.text section Code (functions)
.rodata section Read only data: format strings, jump tables, ...
.data section Initialized global variables, initialized static C variables (not zero)
Local C variables are maintained at run time on the stack and do not appear in either the
.data
or.bss
sections
.bss section (better save space) Uninitialized static C variables, global or static variables that are initialized to zero Has section header but occupies no space (merely a placeholder). At run time, they are allocated in memory with an initial value of zero.
Uninitialized global variable is classified as COMMON
.symtab section Symbol table with information about functions and global variables defined and referenced in the program (no local variables)
m
e.g. nonstatic C 函数,nonstatic 全局变量m
e.g. nonstatic C 函数,nonstatic 全局变量.data
或.bss
分配空间并且在symbol tables里分配独一无二的别名(e.g. 两个函数分别定义static int x = 0
,则在汇编器中定义的别名不同)Symbol tables are built by assemblers, using symbols exported by the compiler into the assembly-language
.s
file.
.text
section
Addresses of instructions that will need to be modified in the executable Instructions for modifying.
usually ommited in executable object files.data
section
Addresses of pointer data that will need to be modified in the merged executable-g
option).text
section (-g
option).symtab
and .debug
sections and for the section names in the section headers
Strong: procedures and initialized globals Weak: unintialized globals
Rule.1 Multiple strong symbols with the same name are not allowed. Otherwise: Linker error
Rule.2 Given a strong symbol and multiple weak symbols with the same name, choose the strong symbol.
Rule.3 Given multiple weak symbols with the same name, choose any of the weak symbols.
can override this with gcc -fno-common
- Relocating sections and symbol definitions merge sections from relocatable object files to executable object files assign run-time memory addresses to the new aggregate sections, to each section defined by the input modules, and to each symbol defined by the input modules
- Relocating symbol references within sections modify every symbol reference in the bodies of the code and data sections so that they point to the correct run-time addresses
these two relocation types support the x86-64 small code model programs larger than 2GB can be compiled using the
-mcmodel=medium/large
flags
objdump -dx main.o
: disassembled code
重定位算法:
重定位 PC-relative references
重定位 Abosolute references
Result:
e.g. 重定位信息(main.o
, sum.o
)
此处,为什么“inter_sum - 0x4":
- 机器码
e8
对应的含义是call
- 该行占据的指令数为5bytes,因此执行到改行时,PC = 0x400507 + 0x5 = 0x40050c
- 后续的4组数字/4bytes为跳转函数的地址与当前地址的offset。且由于LSB (Least Significant Byte, 低位字节占据高地址),此处实际表示的offset = 0xffffffca
- 因此计算得到函数的地址为PC + offset = 0x40050c + 0xffffffca = 0x4004d6
- -4:按照
计算的结果需要减去4才是正确结果 这里,400508为400507跳过callq指令的机器码
a typical executable file:
the ELF header includes the program’s entry point, which is the address of the first instruction to execute when the program runs
Segment header table
From the program header table, we see that two memory segments will be initialized with the contents of the executable object file. Lines 1 and 2 tell us that the first segment (the code segment) has read/execute permissions, starts at memory address 0x400000, has a total size in memory of 0x69c bytes, and is initialized with the first 0x69c bytes of the executable object file, which includes the ELF header, the program header table, and the .init, .text, and .rodata sections.
Lines 3 and 4 tell us that the second segment (the data segment) has read/write permissions, starts at memory address 0x600df8, has a total memory size of 0x230 bytes, and is initialized with the 0x228 bytes in the .data section starting at offset 0xdf8 in the object file. The remaining 8 bytes in the segment correspond to .bss data that will be initialized to zero at run time.
the .init section defines a small function, called _init, that will be called by the program’s initialization code
the .text, .rodata and .data sections are similar to those in a relocatable object file, except that these sections have been relocated to their eventual run-time memory addresses
since the executable is fully linked (relocated), it needs no .rel sections
For any segment, the linker must choose a starting address, vaddr, such that
Since
./main
does not correspond to a built-in shell command, the shell assumes thatmain
is an executable object file, which it runs for us by invoking some memory-resident operating system code known as the loader. Any Linux program can invoke the loader by calling the execve function. The loader copies the code and data in the executable object file from disk into memory and then runs the program by jumping to its first instruction, or entry point. This process of copying the program into memory and then running it is known as loading.
Next, the loader jumps to the program’s entry point, which is always the address of the _start function. This function is defined in the system object file
crt1.o
and is the same for all C programs. The _start function calls the system startup function, _libc_start main, which is defined inlibc.so
. It initializes the execution environment, calls the user-level main function, handles its return value, and if necessary returns control to the kernel.
static library vs shared library
.a
文件)
- 将多个相关的重定位对象文件集成为一个单一的带索引的文件 (称为归档文件, archive file)
- 增强链接器的功能使之能够在归档文件中解析外部符号
- 如果归档文件中的某个成员解析了外部符号,就将其链接入执行文件
libc.a
(the C standard library)
8MB archive of 1392 object files I/O, memory allocation, signal handling, string handling, data and time, random numbers, integer math
libm.a
(the C math library)
1MB archive of 401 object files floating point math (sin, cos, tan, log, exp, sqrt, ...)
静态库文件有其劣势
执行文件中会重复包含有所需的库文件函数或者数据 运行时内存中也会有重复部分 库文件的细微变动需要所有相关执行文件进行重链接
更好的方案:共享库文件方式
特殊类型的重定向对象文件,可以被装载入内存后进行动态链接;链接可以在装载时或者运行时完成 Windows系统下被称为DLL文件
A shared library is an object module that, at either run time or load time, can be loaded at an arbitrary memory address and linked with a program in memory. This process is known as dynamic linking and is performed by a program called a dynamic linker. Shared libraries are also referred to as shared objects, and on Linux systems they are indicated by the
.so
suffix. Microsoft operating systems make heavy use of shared libraries, which they refer to as DLLs (dynamic link libraries)
- one
.so
file for a particular library- a
.text
section in memory shared by different running processes
build a shared library libvector.so
linux> gcc -shared -fpic -o libvector.so addvec.c multvec.c
(-shared
: direct the loader to create a shared object file; -fpic
: directs the compiler to generate position-independent code)
link to program
linux> gcc main2.c ./libvector.so -o prog2l
This creates an executable object file prog2l
in a form that can be linked with libvector.so
at run time.
None of the code or data sections from
libvector.so
are actually copied into the executableprog2l
at this point. Instead, the linker copies some relocation and symbol table information that will allow references to code and data inlibvector.so
to be resolved at load time
prog2l
contains a.interp
section, which contains the path name of the dynamic linker, which is itself a shared object (e.g., ld-linux.so on Linux systems). The loader loads and runs the dynamic linker
dynamic linker: fully linked executable in memory
relocate the text and data of
libc.so
into some memory segment. relocate the text and data oflibvector.so
into another memory segment. relocate any references inprog2l
to symbols defined bylibc
.so andlibvector.so
Linux进程的内存布局(X86-64)
Stack Runtime stack (8MB limit) 栈顶地址:0x 0000 7fff ffff ffff e.g. local variables
Heap
按需动态分配
when call malloc()
, calloc()
, new()
——libC库提供
操作系统只能管理Heap的堆顶调度 LibC库提供管理整个Heap的方法
Data 静态分配的数据空间 e.g. global vars, static vars, string constants
Text / Shared Libraries 指令(代码) Read-only
其实,Heap和Data没有一条泾渭分明的分界线
易受攻击的缓冲区相关代码
xxxxxxxxxx
/* Echo Line */
void echo() {
char buf[4]; /* Way too small! */
gets(buf);
puts(buf);
}
void call_echo() {
echo();
}20221107101421023
输入字符串包含可执行代码的字节表示 用缓冲区B的地址覆盖函数返回地址A 当程序执行ret时,将跳转到“注入代码” (绝对地址)
avoid overflow vulnerabilities
不要用gets()
, puts()
这种函数
使用限制字符串长度的库函数 fgets instead of gets strncpy instead of strcpy Don’t use scanf with %s conversion specification use fgets to read the string or use %ns where n is a suitable integer
employ system-level protections
随机的stack base地址
(处理器支持的)不可执行段 mark region of memory as either "read-only" or "writeable" X86-64 added explicit "execute" permission stack marked as non-executable ——Any attempt to execute this code (exploit code) will fail!
have compiler use "stack canaries"
Stack Canaries (金丝雀) can help
Idea Place special value (“canary”) on stack just beyond buffer Check for corruption before exiting function
GCC Implementation -fstack-protector Now the default (disabled earlier)
Challenges for hackers Stack randomization makes it hard to predict buffer location Marking stack non-executable makes it hard to insert binary code
Alternative Strategy:
Using existing code (e.g. library code from stdlib) String together fragments to achieve overall desired outcome
Construct program from gadgets Gadget: Sequence of instructions ending in ret encoding by single byte 0xc3 Code positions fixed from run to run Code is executable
However, does not overcome stack canaries
协程:可以主动切出,并在之后可恢复运行的函数运行模式 协程的用途:在CPU进行IO操作时,切换到其它Ready函数的执行现场,并在合适时间点切换回来
冯诺依曼架构所面临的“存储墙”和“功耗墙”——存算分离
虚存是一个概念(而非实体) :一个连续的地址空间,进程的数据、代码、运行栈、堆等存于其中
物理地址:Physical Addressing 虚拟地址:Virtual Addressing (VM)
Advantages:
- uses main memory efficiently use DRAM as a cache for the parts of a virtual address space
- simplifies memory management each process gets the same uniform linear address space
- isolates address spaces one process can't interfere with another's memory user program cannot access privileged kernel information
Virtual memory is an array of N contiguous bytes stored on disk.
The contents of the array on disk are cached in physical memory (DRAM cache)
These cache blocks are called pages (size is
large page (block) size: typically 4-8 KB, sometimes 4 MB or larger
Page Tables / 页表:an array of page table entries (PTEs) that maps virtual pages to physical pages per-process kernel data structure in DRAM
"whether cached" + "DRAM offset"
Page Hit / 命中:reference to VM word that is in physical memory
Page Fault / 页缺失:reference to VM word that is not in physical memory page miss causes page fault (an exception) Page fault handler selects a victim to be evicted.
Allocating Pages: 一个Physical Page可能对应任意一个Virtual Page(全相联 fully associative)
工作原理:
- Virtual memory works because of locality(数据/指令的局部性)
- At any point in time, programs tend to access a set of active virtual pages called the working set (活跃页面的集合称为工作集)
- If (working set size < main memory size): Good performance for one process after compulsory misses
- If ( SUM(working set sizes) > main memory size ) : (Thrashing) Performance meltdown where pages are swapped (copied) in and out continuously
每个进程都有自己的私有虚拟空间
Mapping function scatters addresses through physical memmory 同一块物理地址可以被不同的virtual address映射——节省空间 / sharing code and data among processes
Extend PTEs with permission bits (PTE: page table entry) Page fault handler checks these before remapping If violated, send process SIGSEGV (segmentation fault)
不同的权限分配:
violating consequence: the kernel exception handler sends a SIGSEGV signal to the offending process (Linux reports this as a segmentation fault)
Basic Parameters
Components of
Components of
page hit is handled entirely by hardware
page fault requires cooperation between hardware and the operating system kernel
Small hardware cache in MMU Contains complete page table entries for small number of pages Maps virtual page numbers to physical page numbers
Components of a VPN that are used to access the TLB
TLB hit and TLB miss
TLB is 4-set(唯一的) and 4-way(任意的) set associative with 16 total entries.
[EG]:
Linux Virtual Memory System
The virtual memory of a Linux process
Linux Page Fault Handling
Area can be backed by...
Shared Object: two virtual memory Processes can share the same Physical memory object
Process 1 maps the shared object. Process 2 maps the same shared object. (Note that the physical pages are not necessarily contiguous.)
Since each object has a unique filename, the kernel can quickly determine that process 1 has already mapped this object and can point the page table entries in process 2 to the appropriate physical pages. The key point is that only a single copy of the shared object needs to be stored in physical memory, even though the object is mapped into multiple shared areas. For convenience, we have shown the physical pages as being contiguous, but of course this is not true in general.
Private Copy-on-Write (COW) Objects:
execve
Allocator maintains heap as collection of variable sized blocks, which are either allocated or free
Types of allocators:
Explicit allocator:
malloc
andfree
in C(库通过brk
等系统调用开辟heap空间,在其中初始化了相应地数据机构来实现动态内存分配)xxxxxxxxxx
int *p = (int*) malloc(n * sizeof(int)); // allocate a block
if (p == nullptr) {
perror("malloc");
exit(0);
}
for (int i = 0; i < n; ++i) p[i] = i;
free(p); // return p to the heap
Implicit allocator: garbage collection in Java, ML, and Lisp
内存分配器——一些要求
任意大小的block,高速,无冲突,对齐要求(8 bytes alignment for GNU malloc
),返回的是指针(一旦分配不能移动)
吞吐率 / Throughput:单位时间内完成的请求数
峰值内存利用率 / Peak memory utilization
Aggregate payload
is the sum of currently allocated payloads Current heap size , assumed monotonically nondecreasing Peak memory utilization (after requests)
Poor memory utilization caused by fragmentation
payload is smaller than block size
Thus we know how many memory to free given just a pointer.
First Fit
xxxxxxxxxx
p = start;
while ((p < end) // not passed end
&& (*p & 1) // already allocated
|| (*p <= len)) // too small
p = p + (*p & -2) // goto next block (word addressed)
Next Fit: search list starting where previous search finished Best Fit: choose the best free block (fits, with fewest bytes left over)
xxxxxxxxxx
void addblock(ptr p, int len) {
int newsize = ((len + 1) >> 1) << 1; // round up to even
int oldsize = *p & -2; // mask out low bit
*p = newsize | 1; // set new length
if (newsize < oldsize)
*(p + newsize) = oldsize - newsize; // set length in remaining part of block
}
xxxxxxxxxx
void free_block(ptr p) {
*p = *p & -2; // need only clear the "allocated" flag
}
Join with next/previous blocks:
xxxxxxxxxx
void free_block(ptr p) {
*p = *p & -2; // clear allocated flag
next = p + *p;
if ((*next & 1) == 0) // if next block isn't allocated
*p = *p + *next; // add to this block
}
Bidirectional Coalescing / 双向
Boundary Tags [Knuth73]
然而Boundary Tages会产生内部碎片。
异常:各个任务切换的重要机制
Changing Control Flow:
Exceptions: transfer of control to the OS in response to some event (从用户态到核心态 user => kernel)
Examples: div by 0, arithmetic overflow, page fault, I/O request completes, Ctrl-C...
processor pushses additional processor state onto the stack necessary to restart the interrupted program when the handler returns
Control transferred from user program to the kernel: items pushed on the kernel's stack rather than onto the user's stack
Interupt Vectors(中断向量/异常处理向量)
each type of event has unique exception number
handler is called each time exception occurs exception table: used to form the address of the appropriate exception handler exception table base register: contains the starting address of the exception table
Asynchronous Exceptions(异步异常/中断/Interrputs)
Caused by events external to the processor(外设触发) indicated by setting the processor's interrupt pin handler returns to "next" instruction
Examples:
Synchronous Exceptions(同步异常)
Caused by events that occur as a result of executing an instruction(由指令执行引起). This instruction is referred as the faulting instruction
Traps Intentional(程序故意引起的): system calls, breakpoint traps, special instructions Returns control to "next" instruction
reading a file (read) creating a new process (fork) loading a new program (execve) terminating the current process (exit)
Faults Unintentional but possibly recoverable(程序无意引起的): page faults (recoverable), protection faults (unrecoverable), floating point excetions Either re-executes faulting ("current") instruction or aborts
Aborts Unintentional and unrecoverable: parity error, machine check Aborts current program
System calls are provided on x86-64 systems via a trapping instruction called syscall. It is quite interesting to study how programs can use this instruction to invoke Linux system calls directly. All arguments to Linux system calls are passed through general-purpose registers rather than the stack. By convention, register %rax contains the syscall number, with up to six arguments in %rdi, %rsi, %rdx, %r10, %r8, and %r9. The first argument is in %rdi, the second in %rsi, and so on. On return from the system call, registers %rcx and %r11 are destroyed, and %rax contains the return value. A negative return value between −4,095 and −1 indicates an error corresponding to negative errno.
A process is an instance of a running program(程序的一个运行实例)
Process provides each program with two key abstractions:
maintained by... 多进程交错运行、快速切换 and 由虚拟内存系统管理地址空间
Save current registers in memory
Schedule next process for execution
Load saved registers and switch address space (context switch)
(with one CPU core)
Two processes run concurrently if they flows overlap in time ——Otherwise, they are sequential
Processes are managed by a shared chunk of OS code called the kernel [Important]: The kernel is NOT a separate process, but rather runs as part of some user process
creates a new process (child process) that is identical to the calling process (parent process) return 0 to the child process return child's pid to the parent process
fork is called once but returns twice
create VM for new process
create exact copies of current mm_struct
, vm_area_struct
and page tables
flag each page in both processes as read-only
flag each vm_area_struct
in both processes as private COW
fork的机制
exit:终止当前进程
atexit()
: registers functions to be executed upon exit
xxxxxxxxxx
void cleanup(void) {
printf("cleaning up\n");
}
void fork() {
atexit(cleanup);
fork();
exit(0);
}
子进程比父进程先结束,进程终止后仍然会消耗系统资源:various tables maintained by OS
Reaping / 回收:父进程对终止的子进程进行回收(父进程获取子进程的退出状态信息),内核释放子进程的剩余资源
如果父进程没有回收子进程,那么子进程会被init process回收
挂起 / wait:synchronizing with children
xxxxxxxxxx
int wait(int* child_status)
挂起当前进程直到它的一个子进程终止(如果没有子进程,直接返回-1)
返回值是终止的子进程的pid
if child_status != NULL, then the object it points to will be set to a status indicating why the child process terminated(获得子进程终止的原因,正常终止的话是子进程的退出状态值)——void exit(int status)
xxxxxxxxxx
pid_t wait(int& status); // random pid
// == waitpid(-1, status, 0)
pid_t waitpid(pid_t pid, int& status, int options) // specific pid
// pid == -1 means the wait set consists all of the parent's child processes, pid > 0 then wait set is the singleton child process whose process ID is equal to pid
// return PID of child if OK, 0 if WNOHANG, -1 on error (signal / has no children)
宏:WIFEXITED, WEXITSTATUS:
xxxxxxxxxx
pid_t wpid = wait(&child_status);
if (WIFEXITED(child_status))
printf("Child %d terminated with exit status %d\n", wpid, WEXITSTATUS(child_status));
else printf("Child %d terminate abnormally\n", wpid);
loading and running programs in the context of the current process
xxxxxxxxxx
int execve(const char* filename, const char* argv[], const char* envp[]);
execve is called once and never return The execve function loads and runs the executable object file filename with the argument list argv and the environment variable list envp (
name = value
strings). Execve returns to the calling program only if there is an error, such as not being able to find filename.
getenv()
, setenv()
, unsetenv()
xxxxxxxxxx
char* getenv(const char* name); // return pointer to name if exists
int setenv(const char* name, const char* newvalue, int overwrite);
// return 0 on success, -1 on error
void unsetenv(const char* name);
shell: an interactive application-level program that runs other programs on behalf of the user.(“提供用户使用界面”的程序)
the original shell was the
sh
program, followed bycsh
,tcsh
,ksh
andbash
perform a sequence of read/evaluate steps and then terminatesmain routine:
ps -l
F:进程的权限标志 S:进程的状态(stat),主要状态有:R (running), S (sleep), T (stop), D (不可被唤醒), Z (zombie) UID / PID / PPID:此进程被UID所拥有 / 进程的PID号 / 此进程的父进程的PID PRI / NI:priority/nice的缩写,表示该进程被CPU所执行的优先顺序(越小越优先) ADDR / SZ / WCHAN:该进程在内存的部分 / 此进程用掉多少内存 / 目前进程是否在工作 TTY:登入这的终端机位置,若为远程登入则使用动态终端介面(pts/n) TIME:使用掉的CPU时间 CMD:指令
ps -t
jobs
查看挂起的进程信息
background job: [command] &
不需要等待进程结束
前台程序 correctly waits for and reaps foreground jobs 但是,后台程序终止时会变成“僵尸”(因为可能永远不会被回收) 这会导致内存泄漏,导致OS内核资源不足(一旦超过了进程配额 process quota),shell就不能运行任何新命令,即
fork()
返回-1xxxxxxxxxx
ulimit -u // bash syntax 查询进程配额
当一个后台进程完成时,OS内核将采取一种类似“异常”(ECF)的机制来“提醒“ shell。这一机制被称为信号 signal。
Definition: A signal is a small message that notifies a process that an event of some type has occurred in the system
类似于“异常” 其类型由小整数ID表示(1-30)
kill -S -id
由内核(有时在另一个进程的请求下)发送给进程 传递的唯一的信息是其自身(即ID和它到达这一事实)
Default Action 还包括:暂停(直至恢复)
Sending a Signal
Kernel sends (delivers) a signal to a destination process by updating some state in the context of the destination process
Receiving a Signal
A destination process receives a signal when it is forced by the kernel to react in some way to the delivery of the signal
Pending and Blocked Signals
pending:信号已发送但暂未被接收
每一种信号同时最多存在一个pending signal signals并非队列排序:若某进程当前已有pending signal,则后续向该进程发送的该种类的信号将被discarded
Kernel maintains pending and blocked bit vectors in the context of each process 当type k信号在pending时set bit k 当type k信号接收时clear bit k
blocked:一个被目标进程阻塞的信号可以被递送(delivered),但是在信号被解除阻塞之前不会被接收(received)
通过sigprocmask方程进行set和clear操作(Explicit blocking mechanism)
xxxxxxxxxx
int sigprocmask(int how, const sigset_t *set, sigset_t *oldset);
int sigemptyset(sigset_t *set);
int sigfillset(sigset_t *set);
int sigaddset(sigset_t *set, int signum);
int sigdelset(sigset_t *set, int signum);
// Returns: 0 if OK, −1 on error
int sigismember(const sigset_t *set, int signum);
// Returns: 1 if member, 0 if not, −1 on error
Process Groups 进程组
每一个进程属于一个进程组,用pgid
表示group id
xxxxxxxxxx
pid_t getpgrp(void)
: return process group of current process
int setpgid(pid_t pid, pid_t pgid)
: change process group of a process
Sending Signals
with /bin/kill Program
/bin/kill program sends arbitrary signal to a process or process group
Examples:
/bin/kill -9 24818
: send SIGKILL to process 24818/bin/kill -9 -24817
: send SIGKILL to every process in process group 24817
from the Keyboard
ctrl-c
/ctrl-z
sends a SIGINT / SIGTSTP to every job in the foreground process group (fg num
/bg num
恢复进程到前/后台执行,其中num表示jobs
中查询到的数字) SIGINT: default action is to terminate each process SIGTSTP: default action is to stop / suspend each process
with kill Function
int kill(pid_t pid, int SIGINT)
: 则子进程为“不正常退出” / WIFEXITED == Falsepid == 0
: sends signal to every process in the process group of the calling processpid > 0
: sends signal to process pidpid < 0
: sends signal to every process in process group |pid|
Receiving Signals
信息的接收时间:进程被调度恢复执行的时候
pnb = pending & ~blocked
为真
xxxxxxxxxx
typedef void (*handler_t)(int);
handler_t signal(int signum, handler_t handler)
//handler的两种取值:(其他情况handler是signal handler的地址)
// SIG_IGN:忽略类型为signum的信号
// SIG_DFL:恢复ID为signum的信号的默认处理
// Return: pointer to previous handler if OK
// SIG_ERR on error
xxxxxxxxxx
void int_handler(int sig) {
safe_printf("Process %d received signal %d\n", getpid(), sig);
//Async-Signal-Safety的printf,稍后会讲到~
exit(0);
}
void fork13() {
pid_t pid[N];
int i, child_status;
signal(SIGINT, int_handler);
for (i = 0; i < N; i++)
if ((pid[i] = fork()) == 0) {
while(1); // child infinite loop
}
for (i = 0; i < N; i++) {
printf("Killing process %d\n", pid[i]);
kill(pid[i], SIGINT);
}
for (i = 0; i < N; i++) {
pid_t wpid = wait(&child_status);
if (WIFEXITED(child_status))
printf("Child %d terminated with exit status %d\n", wpid, WEXITSTATUS(child_status)); //正常退出
else
printf("Child %d terminated abnormally\n", wpid);
}
}
Signals Handlers as Concurrent Flows
A signal handler is a separate logical flow (not process) that runs concurrently with the main program
具有独立的stack与处理器上下文(CPU context / control flow) 内存是共享的
pending signals are not queued 因信号无法缓存,多个同类信号同时到达的话会丢失若干
living with nonqueuing Signals: must check for all terminate jobs
检查所有的终止子进程:wait
xxxxxxxxxx
while ((pid = waitpid(-1, &child_status, WNOHANG)) > 0) {
safe_printf("Received signal %d from process %d\n", sig, pid);
}
//对比:
// pid_t pid = wait(&child_status);
signal arrival during long system calls (say a read) 系统调用执行过程中信号到达了,signal handler会中断read call,signal handler结束后read call会继续自动进行
signal handler reacts to externally generated events
xxxxxxxxxx
void handler(int sig) {
safe_printf("You think hitting ctrl-c will stop the bomb?\n");
sleep(2);
safe_printf("Well...");
sleep(1);
printf("OK\n");
exit(0);
}
int main() {
signal(SIGINT, handler); /* installs ctrl-c handler */
while(1) {}
}
Function is async-signal-safe if either reentrant(仅有局部变量) or non-interruptible by signals 即当前的函数不会被signal所屏蔽掉 async-signal-safe wrapper for printf: safe_print
setjmp
& longjmp
xxxxxxxxxx
Powerful (but dangerous) user-level mechanism for transferring control to an arbitrary location Controlled to way to break the procedure call / return discipline Useful for error recovery and signal handling
efficient in deeply nested functions
int setjmp(jmp_buf env, int i)
must be called before longjmp
called once, returns one or more times
identifies a return site for a subsequent longjmp
implementation: remember where you are by storing the current register context, stack pointer and PC value in jmp_buf
and return 0
setjmp
的返回值不能赋给一个变量!!! 但可以用于switch
void longjmp(jmp_buf env, int i)
return from the setjmp
remembered by jump buffer
return i instead of 0
called after setjmp
called once, never return
implementation: restore register context, set %eax to i, jump to the location indicated by the PC stored in jmp_buf
Work with stack discipline: can only long jump to environment of function that has been called but not yet completed
sigsetjmp()
andsiglongjmp()
are versions ofsetjmp()
andlongjmp()
that can be used by signal handlers
fork()
xxxxxxxxxx
pid_t fork(void);
// return 0 to child, PID of child to parent, −1 on error
pid_t waitpid(pid_t pid, int *statusp, int options);
// return PID of child if OK, 0 (if WNOHANG), or −1 on error
pid_t wait(int *statusp);
// return PID of child if OK or −1 on error
sleep()
xxxxxxxxxx
unsigned int sleep(unsigned int secs);
// return seconds left to sleep
// The latter case (rtval > 0) is possible if the sleep function returns prematurely because it was interrupted by a signal
pause()
xxxxxxxxxx
int pause(void);
// return -1
execve()
xxxxxxxxxx
int execve(const char *filename, const char *argv[], const char *envp[]);
// does not return if OK; return −1 on error
char *getenv(const char *name);
// return pointer to name if it exists, NULL if no match
int setenv(const char *name, const char *newvalue, int overwrite);
// return 0 on success, −1 on error
void unsetenv(const char *name);
// return nothing
kill()
xxxxxxxxxx
int kill(pid_t pid, int sig);
// return 0 if OK, −1 on error
signal()
xxxxxxxxxx
typedef void (*sighandler_t)(int);
sighandler_t signal(int signum, sighandler_t handler);
// return pointer to previous handler if OK, SIG_ERR on error (does not set errno)
jump()
xxxxxxxxxx
int setjmp(jmp_buf env);
int sigsetjmp(sigjmp_buf env, int savesigs);
// return 0 from setjmp, nonzero from longjmps
void longjmp(jmp_buf env, int retval);
void siglongjmp(sigjmp_buf env, int retval);
// never returns
Definition:
A Unix file is a sequence of m bytes All I/O devices are represented as files(IO设备都视作文件)
/dev/sda2
(/usr disk partition)/dev/tty2
(terminal) Even the kernel is represented as a file/dev/kmem
(kernel memory image)/proc
(kernel data structures)
vim limits
vim maps
Unix File Types
Regular File
often classified into binary files and ASCII / Unicode characters files
\n
, ASCII =LF
, numeric_value =0x0a
File containing user/app data (binary, text, whatever) OS does not know anything about the format(文件格式 / 内容对OS透明)
Directory File A file that contains the names and locations of other files
Character special and block special files Terminals (character special) and disks (block special)
FIFO (named pipe) A file type used for interprocess communication
Socket A file type used for network communication between processes
directory hierarchy
absolute pathname:
/home/droh/hello.c
relative pathname: (current working directory)/home/droh
, (hello.c)./hello.c
summary
Key Features: Elegant mapping of files to devices allows kernel to export simple interface called Unix I/O Important idea: All input and output is handled in a consistent and uniform way
Basic Unix I/O operations / system calls:
Opening and closing files:
open()
andclose()
Reading and writing a file:
read()
andwrite()
Changing the current file offset/position (seek) indicates next offset into file to read or write
lseek()
xxxxxxxxxx
off_t lseek(int filedes, off_t offset, int whence);
// return: new offset if OK, -1 if error
// whence: SEEK_SET: set to offset
// SEEK_CUR: set to cur + offset
// SEEK_END: file_size() + offset
access file metadata:
stat()
xxxxxxxxxx
int open(char* filename, int flags, mode_t mode);
// return: new file descriptor if OK, -1 on error
// flags: O_RDONLY, O_WRONLY, O_RDWR, O_CREAT, O_TRUNC, O_APPEND
// mode: specifies access permission
int fd;
umask(DEF_UMASK);
if ((fd = open("/etc/hosts", O_RDONLLY, DEF_MODE)) < 0) {
// owner: read and write permissions, others: read permissions
perror("open");
exit(1);
}
Return a small identifying integer file descriptor(文件描述符) the descriptor returned is always the smallest descriptor that is not currently open in the process
fd == -1
:an error occurredEach process created by a Unix shell begins life with three open files associated with a terminal (files are recorded in a FD_TABLE) 0: standard input 1: standard output 2: standard error (diagnostic) output
xxxxxxxxxx
const: STDIN_FILENO, STDOUT_FILENO, STDERR_FILENO
xxxxxxxxxx
int close(int fd);
// Returns 0 if OK, -1 on error
int fd; /*file descriptor*/
int retval; /*return value*/
if ((retval = close(fd)) < 0) {
perror("close");
exit(1);
}
Closing an already closed file is an error Moral: always check return codes, even for seemingly benign functions such as
close()
读取文件内容并更新文件访问位置
xxxxxxxxxx
ssize_t read(int fd, void* buf, size_t n);
// Returns: number of bytes read if OK, 0 on EOF, -1 on error
char buf[512];
int fd; /*file descriptor*/
int nbytes; /*number of bytes read*/
/* Open file fd ... */
/* Then read up to 512 bytes from file fd */
if ( (nbytes = read(fd, buf, sizeof(buf))) < 0 ) {
perror("read"); //output to stderr
exit(1);
}
returns number of bytes read from file fd into buf
Return type ssize_t is signed integer nbytes < 0 indicates that an error occurred Short counts (nbytes < sizeof(buf)) are possible and are not errors
写入文件内容并更新文件访问位置
xxxxxxxxxx
ssize_t write(int fd, const void* buf, size_t n);
// Returns: number of bytes written if OK, -1 on error
char buf[512];
int fd; /*file descriptor*/
int nbytes; /*number of bytes read*/
/* Open the file fd ... */
/* Then write up to 512 bytes from buf to file fd*/
if ((nbytes = write(fd, buf, sizeof(buf)) < 0) {
perror("write");
exit(1);
}
returns number of bytes written from buf to file fd
nbytes < 0 indicates that an error occurred As with reads, short counts are possible and are not errors
Definition: metadata is data about data, in this case file data
Per-file metadata maintained by kernel Accessed by users with the stat and fstat functions
xxxxxxxxxx
int stat(const char* filename, struct stat* buf);
int fstat(int fd, struct stat* buf);
// Returns: 0 if OK, -1 on error
xxxxxxxxxx
int main (int argc, char **argv) {
struct stat buf;
char *type, *readok;
stat(argv[1], &buf);
if (S_ISREG(buf.st_mode)) type = "regular";
else if (S_ISDIR(buf.st_mode)) type = "directory";
else type = "other";
if ((buf.st_mode & S_IRUSR)) /* OK to read?*/ readok = "yes";
else readok = "no";
printf("type: %s, read: %s\n", type, readok);
exit(0);
}
accessing directories
xxxxxxxxxx
DIR* opendir(const char* name);
//Returns: pointer to handle if OK, NULL on error
struct dirent* readdir(DIR* dirp);
//Returns: pointer to next directory entry if OK, NULL is no more entries or error
{
DIR *directory;
struct dirent *de;
...
if (!(directory = opendir(dir_name)))
error("Failed to open directory");
...
while (0 != (de = readdir(directory))) {
printf("Found file: %s\n", de->d_name);
}
...
closedir(directory);
}
struct dirent {
long d_ino; /*inode number 索引节点号 */
off_t d_off; /* offset to this dirent 在目录文件中的偏移 */
unsigned short d_reclen; /* length of this d_name 文件名长 */
unsigned char d_type; /* the type of d_name 文件类型 */
char d_name [NAME_MAX+1]; /* file name(null结尾)文件名最长255字符*/
}
File sharing:
After call to fork
:
xxxxxxxxxx
int main() {
freopen("in.txt", "r", stdin); //输入重定向,输入数据将从in.txt中读取
freopen("out.txt", "w", stdout); //输出重定向,输出数据讲保存到out.txt
}
xxxxxxxxxx
int dup2(int oldfd, int newfd);
//Returns: nonnegative descriptor if OK, -1 on error
int dup(int fd); // fd1 = dup(fd4);
Example: call
dup(4, 1)
Remarks: 此时fd1和fd4连续调用read(fd, &buffer, 1)
得到的&buffer
将不同(因为其中一个调用完read
之后file pos
的值也改变了)
Remarks:几种可能的Descriptor table和Open file table的关系
- 一个进程的多个descriptor table entry一一对应Open file table中的不同项 这是最一般的情况(多次打开同一个文件也是如此)
- 不同进程的同一descriptor table entry对应Open file table中的同一项 子进程继承父进程
- 同一进程的不同descriptor table entry对应Open file table中的同一项 dup / dup2
The C standard library (libc.so
) contains a collection of higher-level standard I/O functions
Opening and closing files:
fopen
,fclose
Reading and writing bytes:fread
,fwrite
Reading and writing text lines:fgets
,fputs
Formatted reading and writing:fscanf
,fprintf
特点:流式文件处理(a stream is a pointer to a structure of type FILE) ANSI C program begins with three open streams: stdin, stdout, stderr
xxxxxxxxxx
extern FILE* stdin; // descriptor 0
extern FILE* stdout; // descriptor 1
extern FILE* stderr; // descriptor 2
通过fflush()
调用显示地flush到输出文件 / 检测到换行符输出
strace ./main
显示进程的系统调用信息
不能用在二进制文件上的函数:
基于行处理的IO:fgets
, scanf
, printf
, rio_readlineb
不同系统解释换行符的方式不同0x0a (
\n
): Linux和Mac OS X:LF(0x0a), [\n
] HTTP服务器 & Windows:CR+LF(0x0d 0x0a), [\r\n
] 使用rio_readn
或rio_readnb
替代
字符串处理函数:strlen
, strcpy
——解释字节0为串结尾
进程:Process = process context + code, data and stack 进程上下文(含处理器状态 以及 内核上下文)与虚存空间
线程:Process = thead + code, data and kernel context
线程thread 和 进程progress 的区别:有独立的处理器状态,和主进程共享内核上下文 线程thread 和 协程coroutine 的区别:前者OS调用,后者为用户调用thread
每个线程有自己的处理器状态:CPU context 每个线程有自己的线程编号:TID
并发(concurrency):一个处理器同时处理多个任务(逻辑上的同时发生,依赖于轮询) 并行(parallel):多个处理器或者多核处理器同时处理多个不同的任务(物理上的同时发生)
Pthreads: Standard interface for ~60 functions that manipulate threads from C programs
void *threadroutine(void *vargp)
pthread_create(pthread_t *tid, …, func *f, void *arg)
pthread_join(pthread_t tid, void **thread_return)
pthread_self()
pthread_cancel(pthread_t tid)
pthread_exit(void *thread_return)
return (in primary thread routine terminates the thread)
exit (terminates all threads)
xxxxxxxxxx
void *thread(void *vargp);
int main() {
pthread_t tid;
Pthread_create(&tid, NULL, thread, NULL);
// NULL = Thread attributes(usually NULL)
// NULL = Thread arguments(void *p)
Pthread_join(tid, NULL); // assigns return value(void **p)
// 是为了防止主线程没有给其他线程执行时间就返回而设计的
exit(0);
}
/* thread routine */
void *thread(void *vargp) {
printf("Hello, world!\n");
return NULL;
}
Pros: 易于在线程之间共享数据;线程比进程更有效率 Cons:无意的共享可能会导致微妙的、难以重现的错误
Definition:一个变量x是共享的,当且仅当多个线程引用x的某个实例
概念模型:
操作层面:
全局变量: 定义在所有函数外面的变量 虚拟内存包含全局变量的唯一实例 局部变量:定义在函数内部的变量,无static属性 每个线程栈包含每个局部变量的一个实例 局部静态变量:定义在函数内部,有static属性 虚拟内存包含局部静态变量的唯一实例
xxxxxxxxxx
char **ptr; /* global */
void *thread(void *vargp) {
int myid = (int)(__int64_t)vargp;
static int cnt = 0;
printf("[%d]: %s (svar=%d)\n", myid, ptr[myid], ++cnt);
}
int main() {
int i;
pthread_t tid;
char *msgs[2] = {
"Hello from foo",
"Hello from bar"
};
ptr = msgs;
for (i = 0; i < 2; i++)
pthread_create(&tid, NULL, thread, (void *)i);
pthread_exit(NULL);
}
/*
编译选项:链接-lpthread(通常只默认链接标准库-lstdc++)
g++ test.cpp -lpthread -o test
运行结果:
[0]: Hello from foo (svar=1)
[1]: Hello from bar (svar=2)
In this case:
ptr, cnt and msgs are shared
i and myid are not shared
*/
xxxxxxxxxx
//badcnt.c:失败的线程间同步的例子
void *thread(void *vargp) {
int i, niters = *((int *)vargp);
for (i = 0; i < niters; i++) cnt++;
return NULL;
}
volatile int cnt = 0; /* global */
int main(int argc, char **argv) {
int niters = atoi(argv[1]);
pthread_t tid1, tid2;
Pthread_create(&tid1, NULL, thread, &niters);
Pthread_create(&tid2, NULL, thread, &niters);
Pthread_join(tid1, NULL);
Pthread_join(tid2, NULL);
/* Check result */
if (cnt != (2 * niters))
printf("BOOM! cnt=%d\n”, cnt);
else
printf("OK cnt=%d\n", cnt);
exit(0);
}
/*
linux> ./badcnt 10000
OK cnt=20000
linux> ./badcnt 10000
BOOM! cnt=13056
*/
/*
更改汇编后重新编译:g++ test.s -lpthread -o test
#movl cnt(%rip), %eax // L: load cnt
#addl $1, %eax // U: update cnt
#movl %eax, cnt(%rip) // S: store cnt
incl cnt(%rip)
*/
We can analyze the behavior using a progress graph 进度图 A trajectory is a sequence of legal state transitions that describes one possible concurrent execution of threads. 轨迹描述了状态转换的顺序,它代表一种可能的线程并发执行方式。 critical section 临界区:w.r.t the shared variable cnt
We must synchronize 同步 the execution of the threads so that they never have an unsafe trajectory (i.e., need to guarantee mutually exclusive access to critical region)
经典方法:Semaphores 信息量 (Edsger Dijkstra)
Other Approaches: Mutex and condition variables (Pthreads) Monitors (Java)
Semaphore:非负整数的,用于同步的全局变量
由P、V操作组成:[ while (s == 0) wait(); s--; ]
, [ s++; ]
OS内核确保[]
之间的操作不可分割:一次只有一个P或者V操作修改s;只有P能减小s
Semaphore invariant: (s >= 0)
Pthreads functions:
xxxxxxxxxx
int sem_init(sem_t* sem, 0, unsigned int val); // s = val
int sem_wait(sem_t* s); // P(s)
int sem_post(sem_t *s); // V(s)
Basic idea:
P(mutex)
和V(mutex)
保护Terminology:
Binary semaphore 二值/元信号量 semaphore whose value is always 0 or 1
Mutex 互斥锁 binary semaphore used for mutual exclusion
Counting semaphore 计数信号量 used as a counter for set of available resources
xxxxxxxxxx
volatile int cnt = 0; /* Counter */
sem_t mutex; /* Semaphore that protects cnt */
Sem_init(&mutex, 0, 1); /* mutex = 1 */
// Surround critical section with P and V
void *thread(void *vargp) {
int i, niters = *((int *)vargp);
for (i = 0; i < niters; i++) {
P(&mutex);
cnt++;
V(&mutex);
}
}
However, it's much slower than badcnt.c
基本思想:线程使用信号量来通知另一个线程某些条件为真 使用计数信号量来跟踪资源使用状态,并通知其他线程 使用互斥信号量保护对共享资源的访问
典型的例子:生产者 / 消费者问题 The Producer-Consumer Problem
N个元素的生产者消费者问题:需要一个互斥信号量和两个计数信号量
mutex
:对缓冲区的互斥访问slots
:计算缓冲区中可用的槽位items
:计算缓冲区中可用的条目slots
+items
=n
结构体实现
xxxxxxxxxx
//sbuf.h
typedef struct {
int *buf; /* Buffer array */
int n; /* Maximum number of slots */
int front; /* buf[(front+1)%n] is first item */
int rear; /* buf[rear%n] is last item */
sem_t mutex; /* Protects accesses to buf */
sem_t slots; /* Counts available slots */
sem_t items; /* Counts available items */
} sbuf_t;
void sbuf_init(sbuf_t *sp, int n) {
sp->buf = Calloc(n, sizeof(int));
sp->n = n; /* Buffer holds max of n items */
sp->front = sp->rear = 0; /* Empty buffer iff front == rear */
Sem_init(&sp->mutex, 0, 1); /* Binary semaphore for locking */
Sem_init(&sp->slots, 0, n); /* buf has n empty slots */
Sem_init(&sp->items, 0, 0); /* Initially, buf has 0 items */
}
void sbuf_deinit(sbuf_t *sp) {
Free(sp->buf);
}
void sbuf_insert(sbuf_t *sp, int item) {
// insert item onto the rear of shared buffer sp
P(&sp->slots); /* Wait for available slot */
P(&sp->mutex); /* Lock the buffer */
sp->buf[(++sp->rear) % (sp->n)] = item; /* Insert the item */
V(&sp->mutex); /* Unlock the buffer */
V(&sp->items); /* Announce available item */
}
int sbuf_remove(sbuf_t *sp) {
// remove and return the first item from buffer sp
int item;
P(&sp->items); /* Wait for available item */
P(&sp->mutex); /* Lock the buffer */
item = sp->buf[(++sp->front) % (sp->n)]; /* Remove the item */
V(&sp->mutex); /* Unlock the buffer */
V(&sp->slots); /* Announce available slot */
return item;
}
补充:理发师理发问题
竞争:程序的正确性依赖于某种顺序
发生race的示例:
xxxxxxxxxx
// race.c
void *thread(void *vargp) {
int myid = *((int *)vargp);
printf("Hello from thread %d\n", myid);
return NULL;
}
int main() {
pthread_t tid[N];
int i;
for (i = 0; i < N; i++)
Pthread_create(&tid[i], NULL, thread, &i);
for (i = 0; i < N; i++)
Pthread_join(tid[i], NULL);
exit(0)
}
主线程对i的改变与子线程对vargp的引用之间发生竞争
race的消除:
xxxxxxxxxx
void *thread(void *vargp) {
int myid = *((int *)vargp);
Free(vargp);
printf("Hello from thread %d\n", myid);
return NULL;
}
int main() {
pthread_t tid[N];
int i, *ptr;
for (i = 0; i < N; i++) {
ptr = Malloc(sizeof(int));
*ptr = i;
Pthread_create(&tid[i], NULL, thread, ptr);
}
for (i = 0; i < N; i++)
Pthread_join(tid[i], NULL);
exit(0);
}
定义:等待一个永远不会发生的条件 经典场景:进程1和进程2需要两个资源A和B以继续,进程1获得了A等待B,进程2获得了B等待A
xxxxxxxxxx
int cnt = 0;
int NITERS;
sem_t mutex[2];
void *count(void *vargp) {
int i;
int id = (int)(__int64_t)vargp;
for (i = 0; i < NITERS; i++) {
sem_wait(&mutex[id]); sem_wait(&mutex[1-id]);
cnt++;
sem_post(&mutex[id]); sem_post(&mutex[1-id]);
}
return NULL;
}
int main(int argc, char** argv) {
NITERS = atoi(argv[1]);
pthread_t tid[2];
sem_init(&mutex[0], 0, 1); /* mutex[0] = 1 */
sem_init(&mutex[1], 0, 1); /* mutex[1] = 1 */
pthread_create(&tid[0], NULL, count, (void*) 0);
pthread_create(&tid[1], NULL, count, (void*) 1);
pthread_join(tid[0], NULL);
pthread_join(tid[1], NULL);
printf("cnt=%d\n", cnt);
exit(0);
}
避免死锁:使用同样的顺序加锁
xxxxxxxxxx
void *count(void *vargp) {
int i;
int id = (int) vargp;
for (i = 0; i < NITERS; i++) {
P(&mutex[0]); P(&mutex[1]);
cnt++;
V(&mutex[id]); V(&mutex[1-id]);
}
return NULL;
}
xxxxxxxxxx
int main() {
printf("Hello world\n");
exit(0);
return 0;
}
gcc -S -Og helloworld.c
xxxxxxxxxx
.file "hello.c"
.section .rodata
.LC0:
.string "Hello world"
.text
.globl main
.type main, @function
main:
subq $8, %rsp
movl $.LC0, %edi
call puts
movl $0, %edi
call exit
.
开头的行都是汇编指示,如.file
,.def
,.text
等,用以直到汇编器如何进行汇编。其中.file
,.def
,.CFI
等均用于调试(可以忽略)
示例:.globl main
表示汇编器符号main
是全局的,.LC0
则不是全局可见的:
结尾的字符串(如main:
)是用以表示变量或者函数的地址的符号.text
或.section .text
表示代码段
.data
或.section .rodata
表示(只读)数据段.p2align 4,,15
指定下一行的对齐方式第一个参数表示按2的4
次幂字节对齐,第二个参数表示对齐时额外空间用什么数据来填充,第三个参数表示最多允许额外填充多少字节
int c = 1;
:初始化全局变量xxxxxxxxxx
.section .rodata
LC0: .string "Hello World"
.global c
.data
.align 4
c: .long 1
as my_file.s -o my_object_file.o
:加入-gstabs
选项产生带调试信息的object文件
ld my_object_file.o -o my_exe_file
:可以有多个.o
文件;加入-g
选项产生调试信息
"Hello World.c"
xxxxxxxxxx
.data # 数据段
msg: # 字符串的地址
.ascii "Hello World\n"
len = .-msg # “.”表示当前地址
.text # 代码段
.globl _start # 汇编程序的入口
_start:
movq $len, %rdx
movq $msg, %rsi
movq $1, %rax # 系统输出(write系统调用)
movq $1, %rdi # stdout
syscall
movq $60, %rax # 程序退出
movq $0, %rdi # 退出值
syscall
X84-64 Linux下的系统调用是通过syscall
指令(一种异常机制)实现的
%rax
存放系统调用的功能号
传给系统调用的参数必须按顺序存放到寄存器%rdi
, %rsi
, %rdx
, %r10
, %r8
, r9
中
当系统调用完成之后,返回值可以再寄存器%rax
中获得(一般<0表示错误)
相当于C语言形式int main(int argv, char *argv[])
(*) 当一个可执行程序通过命令行启动时,命令行参数将被保存到栈中
xxxxxxxxxx
.text
.globl _start
_start:
popq %rsi # argc
vnext:
popq %rsi # argv[?]
testq %rsi, %rsi # 空指针表明结束
jz exit # 即je
movq %rsi, %rbx
xorq %rdx, %rdx # rdx = 0
strlen:
movb (%rbx), %al
incq %rdx
incq %rbx
testb %al, %al
jnz strlen
movb $10, -1(%rbx) # 10是换行键
movq $1, %rax # 系统调用号(sys_write)
movq $1, %rdi # 文件描述符(stdout)
syscall
jmp vnext
exit:
movq $60, %rax # 程序退出
movq $0, %rdi # 退出值
syscall
gdb调试:
lib_c
库函数汇编命令:as hello.s -o hello.o
ld -lc -dynamic-linker /lib64/ld-linux-x86-64.so.2 hello.o -o hello
xxxxxxxxxx
.section .rodata
.LC0:
.string "Hello World"
.text
.globl _start
_start:
movl $.LC0, %edi
call puts
movl $0, %edi
call exit
三个常用的段
.data
:数据段(声明带有初始值的数据)
.bss
:数据段(声明无需初始化的数据)
.text
:正文段(程序指令)
程序入口地址:_start
符号表示默认的起始点
如想汇编内部的符号能够被外部模块访问,需赋予.globl
属性,如.globl _start
xxxxxxxxxx
output:
.ascii "hello world"
pi:
.float 3.14
# 声明可以在一行中定义多个值,如:
age:
.int 20, 10, 30, 40 # 数组
类型说明:
和data
段不同, 无需声明特定的数据类型, 只需声明为所需目的保留的一定长度的内存即可
.comm
声明为未初始化的全局内存区域
.lcomm
声明为未初始化的局部内存区域
xxxxxxxxxx
.section .bss
.lcomm buffer, 1000
#该语句把1000字节的内存地址赋予buffer, 外部模块不能访问他们
factorial.s
)xxxxxxxxxx
.section .text
.globl factorial #this is unneeded unless we want to share it
.globl _start
_start:
movl $4, %edi #The factorial takes one argument –
#the number we want a factorial of.
call factorial #run the factorial function
movl %eax, %edi #factorial returns the answer in %eax, but we
#want it in %edi to send it as our exit status
movq $60, %rax #exit code
syscall
#This is the actual function definition: factorial(n)
.type factorial, @function
factorial:
movl $1, %eax
cmpl $1, %edi #If the number is 1, that is our base case, and
#we simply return
je end_factorial
pushq %rdi
decl %edi #otherwise, decrease the value
call factorial #call itself
popq %rdi
imull %edi, %eax #multiply that by the result of the last call to
#factorial; the answer is stored in %eax.
end_factorial:
ret
xxxxxxxxxx
extern void stats(int[], int, int *, int *);
int main() {
int lst[] = {1, -2, 3, -4, 5, 7, 9, 11};
int len = 8;
int sum, ave;
stats(lst, len, &sum, &ave);
printf ("Stats:\n");
printf ("Sum = %d \n", sum);
printf ("Ave = %d \n", ave);
return 0;
}
xxxxxxxxxx
# Function to find the integer sum and integer average for a passed list of signed integers.
# Call:
# stats(lst, len, &sum, &ave);
# Arguments Passed:
# 1) rdi - address of array
# 2) rsi - length of passed array
# 3) rdx - address of variable for sum
# 4) rcx - address of variable for average
# Returns:
#
.section .text
.globl stats
stats:
pushq %r12 # callee saved
movq $0, %r11 # index
movl $0, %r12d # sum
sumLoop:
movl (%rdi,%r11,4), %eax #get lst[i]
addl %eax, %r12d #update sum
incq %r11 #index++
cmpq %rsi, %r11
jb sumLoop
movl %r12d, (%rdx) #return sum
movl %r12d, %eax
cltd #sign-extend %eax —> %edx:%eax
idivl %esi
movl %eax, (%rcx) #return average
#Done, return …
popq %r12
ret
略