MIT6.S081 System

[Lec 1] Introduction & Examples

好的 OS 处理好以下难以调和的目标

efficient & well-abstracted
powerful & simple api
flexible & secure

注意检查 system call 的返回，判断error

fd=open('filename',0) 返回当前未使用的标号最小的file descriptor，0 是默认的打开方式，另外有 O_CREAT O_APPEND O_RDONLY O_WRONLY O_RDWR；允许出现 O_CREAT|O_RDWR。

pidof：提供程序名，返回它的pid

(in real unix) /proc/pid_of_process/ 中储存了文件的系统层信息，fd 文件夹中的链接即为 file descriptor 及它们指向的实体。

inode

inode 是 unix 系统中对磁盘上文件的信息储存方式，使用 stat 可以查看一个文件名映射到的 inode

$ stat README.md
  File: README.md
  Size: 113             Blocks: 8          IO Block: 4096   regular file
Device: 820h/2080d      Inode: 434332      Links: 1
Access: (0755/-rwxr-xr-x)  Uid: ( 1000/    xing)   Gid: ( 1000/    xing)
Access: 2022-02-22 23:20:21.504782371 +0800
Modify: 2022-02-22 21:09:04.114819990 +0800
Change: 2022-02-22 21:09:04.114819990 +0800
 Birth: -

通过 ln srcfile destfile 创造硬链接，destfile 的 Inode 标号与 srcfile 相同，文件系统中这两个文件名都不过是指向磁盘中同一块区域的链接。当指向一个 Inode 的链接数降到 0 时，操作系统将文件从磁盘上抹去。

inode 的设计本质上是对文件的抽象，是一种 isolation。

[Lec 3] OS Organization

内核态、用户态：通过一个 flag 来区分，flag=0为内核态，flag=1为用户态

应用程序通过调用 ecall <n> 来指定要进行的系统调用，并在此地进入内核继续执行

kernel/syscall.h 指定了 n 与 syscall 的对应关系

使用 riscv64-linux-gnu-objdump -d exename 来查看 exename 的汇编码

kernel 通过计时器进行 interrupt，来切换 cpu 上运行的进程

kernel = trusted computing base

kernel is bugfree
kernel treats all processes as malicious

in kernel/syscall.c
With designated list initialization (ISO C99), explicit initialization of the array members is possible.
The compiler can deduce the length of the array for you, this can be achieved by leaving the square brackets empty.

1 2	int array[5] = {[2] = 5, [1] = 2, [4] = 9}; /* array is {0, 2, 5, 0, 9} */

qemu-system-riscv64: terminating on signal 15 from pid xx (make) make 收到终止指令，于是向模拟器发送终止指令

[Lec 4] Page Tables

each process has its own memory map va->pa
the map is stored in memory
MMU looks at memory and does the translation
the register satp in CPU points where the page table is store
satp is maintained by kernel, only modified by kernel

Index(27b) + offset(12b) -> PPN(44b) + offset(12b)

*PPN: physical page number

satp -> address of top level page table

Index -> 9b + 9b + 9b
-> key of top level page table, entries in the table are called PTE

top level page table

the table’s size is $2^9*(64/8)=2^12=4096$, occupies one whole page
one PTE constitutes of one PPN of the next level page table, takes 64b
we cannot depend a translation service on top another translation service
the PTEs (should) have last 12b set 0, but lowest 10 bits are used for translation control

from high(9) to low(0) are:

RSW(2): reserved for superviser software
D(1) : dirty
A(1) : Accessed
G(1) : Global
U(1) : User - accessible by process running in user space
X(1) : Executable - executing instructions from it allowed
W(1) : Writable - writing to page allowed
R(1) : Readable - reading from page allowed
V(1) : Valid - it’s a valid PTE for translation
PTE not assigned before use

TLB: translation lookaside buffer

to reduce the three memory access
caches [VA,PA] mapping
On switching to another process’s pagetable, the OS tells the TLB to flush itself (using sfence_vma). on other occations, OS & TLB don’t communicate (CPU communicate with TLB).

Kernel lives in [KERNBASE,PHYSTOP)=[0x80000000,?)

kernel starts the first page table, to keep simple, this page table is mostly an identity mapping (VA==PA), mapping is completely identical on KERNBASE~PHYSTOP

PA >= 0x80000000: index to DRAM
PA < 0x80000000: communication with other hardware on chipset

[Lec 5] RISC-V Calling Conventions

GDB

tui enable
layout {split,asm,src}
p $pc
p /x *argv@argc
x /2c $a1 (print 2 characters)
x /6i 0x3.... (print 6 instructions)
info reg
apropos cmd look in manual

<Ctrl-a c>: goes into qemu console

info mem: print the page table specified by satp

[Lec 6] Trap

从用户态转到内核态时，内核将32个 User Registers 直接保存下来。
Supervisor mode 的权限

读写 Control Registers
SATP, STVEC, SEPC, SSCRATCH
调用没有设置 PTE_U flag 的 PTE

不能

读写任意物理地址（必须通过PTE）
调用设置 PTE_U flag 的 PTE

Trap 流程：

ecall (不切换页表，切换到管理者模式，将用户的[pc]保存在[sepc]中)
uservec (trampoline 中的汇编函数)
usertrap 将 [sepc] 储存在 trapframe 中
syscall()
sys_xxx()
syscall()
usertrapret()
userret

[Lec 8] Page Faults

页面映射：

静态：kernel 在启动与 fork 时操作页表
动态：页面错误

kernel 响应 page fault 时需要哪些信息？

出错的虚拟地址
page fault 采用 trap 机制，将访问失败的虚拟地址写在 [stval] 中
page fault 的错误类型
造成 page fault 的指令的虚拟地址，它被储存在 [sepc] 中，随后被纯存在 trapframe::epc 中。
修复错误后，需要重新执行该指令

sbrk(): eager allocation, kernel allocates physical memory on demand