Lec8: Main Memory - 我的博客

Main Memory

![NotebookLM Mind Map (8)](./assets/NotebookLM Mind Map (8).png)

Background

Program must be brought (from disk) into memory and placed within a process for it to be run
Main memory and registers are only storage CPU can access directly
Register access in one CPU clock (or less)
Main memory can take many cycles
Cache sits between main memory and CPU registers
Protection of memory required to ensure correct operation
对于 CPU 来说，访问 main memory 来说是非常慢的

Memory Hierarchy

Emerging NVM Technologies

NVMs :
- non-volatile; low idle power; no refreshes; high write overheads; etc.
Phase-ChangeMemory(PCM):
- Intel/Micron 3D Xpoint
- Intel Optane DC PersistentMemory / DC SSD
ReRAM/RRAM:
- Arbitrary programmed cell resistance (“memristor”).
- First invented by HP Labs, now produced by many companies (in early stage).
non-volatile：非易失性。也就是断电之后数据不会丢失。

Base and Limit Registers

在计算机运行的时候，我们需要保证每一个进程都有一个单独的内存空间，单独的进程内存空间可以保护进程而不受相互影响，这对于将多个进程加载到内存以便并发执行来说至关重要。

A pair of base and limit registers define the logical address space
为了确定每个进程的内存空间，我们设置了一对寄存器 base 和 limit 来确定 logical address space。具体来说，就是== $[base, base+limit)$ ==这样的空间。

Addresses Given in Different Ways

Symbolic Address: Addresses in the source program are generally symbolic (such as the variable count).
- 符号地址。程序员使用
Relocatable Addresses: A compiler typically binds these symbolic addresses to relocatable addresses (such as “14 bytes from the beginning of this module”).
- 可重定位地址。编译器使用
Absolute Addresses: The linker or loader binds the relocatable addresses to absolute addresses (such as 74014).
- 绝对地址。memory 使用

Multistep Processing of a User Program

A compiler is a computer program (or set of programs) that transforms source code written in a computer language (the source language) into another computer language (the target language, often having a binary form known as object code).
A linker or linkage editor is a program that takes one or more objects generated by a compiler and combines them into a single executable program.
A loader loads the .exe file into memory for execution.

步骤	处理程序	输入	输出
翻译	编译器	源代码 (Symbolic)	目标模块 (Relocatable)
组合	链接器	多个目标模块	可执行文件 (.exe)
执行前准备	加载器	可执行文件	内存中的程序 (Absolute)

程序在：
- compile time：使用 Symbolic Address
- load time：使用 Relocatable Addresses
- execution time(run time)：使用 Absolute Addresses

Binding of Instructions and Data to Memory

将一个地址，通过某种方式，映射到另外一个地址，这样的行为叫做 Address binding。

Address binding of instructions and data to memory addresses can happen at three different stages
- Compile time（编译时刻）: If memory location known a priori, absolute code can be generated; must recompile code if starting location changes
- 如果编译的时候就已经知道进程将在内存中的主流地址，那么就可以生成绝对代码。如果将来开始地址发生变化，那么就有必要重新编译代码。
- Load time（装入时刻）: Must generate relocatable code if memory location is not known at compile time
- 若在编译时并不知道进程将驻留在何处，那么编译器就应该生成可重定位代码。对于这种情况，最后绑定会延迟到加载时才会进行。若果开始地址发生变化，只需要重新加载用户代码以合并更改的值。
- Execution time（执行时刻）: Binding happens when instruction is executed. Process can be moved during its execution from one memory segment to another. Need hardware support for address maps (e.g., base and limit registers)
- 如果进程可以在执行的时候可以从一个内存段转移到赢一个内存段，那么绑定应该延迟到执行时才进行。采用这种方法需要特定的硬件才可以。现代计算机基本上都是用这种方法。
  - 实际上，virtual adress 的方法就属于这一种
  - 可以给进程最大程度的灵活性。

阶段	触发时机	生成的代码类型	灵活性	硬件要求
编译时 (Compile Time)	代码编译期间	绝对代码 (Absolute Code)	最低：起始地址改变必须重新编译。	无特殊要求
加载时 (Load Time)	程序调入内存时	可重定位代码 (Relocatable Code)	中等：起始地址改变只需重新加载。	无特殊要求
执行时 (Execution Time)	指令实际执行时	逻辑/虚拟地址	最高：程序运行中可在内存间移动。	需要 MMU（如基址/界限寄存器）

Logical vs. Physical Address Space

The concept of a logical address space that is bound to a separate physical address space is central to proper memory management
- Logical address – generated by the CPU; also referred to as virtual address
- Physical address – address seen by the memory unit
Logical and physical addresses are the same in compile-time and load-time address-binding schemes;
Logical (virtual) and physical addresses differ in execution-time address-binding scheme

Memory-Management Unit (MMU)

对于 Execution time binding（执行时绑定），我们需要一个硬件来实现这个功能。

Hardware device that maps virtual to physical address
In MMU scheme, the value in the relocation register is added to every address generated by a user process at the time it is sent to memory
The user program deals with logical addresses; it never sees the real physical addresses

Dynamic Loading

Routine is not loaded until it is called
Better memory-space utilization; unused routine is never loaded
Useful when large amounts of code are needed to handle infrequently occurring cases
No special support from the operating system is required implemented through program design

一个程序只有被需要的时候，才会被加载。

Dynamic Linking

Linking postponed until execution time
- Small piece of code, stub, used to locate the appropriate memory-resident library routine
- Stub replaces itself with the address of the routine, and executes the routine
Operating system needed to check if routine is in processes’ memory address
Dynamic linking is particularly useful for libraries
- Saves main memory space
- Reduces size of exe image file
- Relinking of new library not needed
System also known as shared libraries

Contiguous Allocation

Main memory usually into two partitions:
- Resident operating system, usually held in low memory with interrupt vector
  - 驻留操作系统区 (Resident OS)：通常保存在低端内存中，并包含中断向量。
- User processes then held in high memory
  - 用户进程区：保存在高端内存中。
Relocation registers used to protect user processes from each other, and from changing operating-system code and data
为了保护用户进程不互相干扰，并防止它们修改操作系统的代码和数据，系统使用了两个关键寄存器
- Relocation register contains value of smallest physical address
- 重定位寄存器 (Relocation Register)：存储进程最小的物理地址值。
- Limit register contains range of logical addresses – each logical address must be less than the limit register
- 界限寄存器 (Limit Register)：存储逻辑地址的范围。因为我们的逻辑地址总是从0开始，因此 Limit Register 中储存的值实际上就逻辑地址上限值。
- MMU maps logical address dynamically
Single contiguous memory allocation（单一连续内存分配）：
- 只能运行一个程序：整个内存分配给一个进程，其他程序必须等待它结束，无法并发运行。
- 没有隔离性：没有内存保护机制，程序之间容易互相干扰
- 资源利用率低：内存空闲部分无法被其他程序使用，造成浪费。
- 不支持进程切换：多道程序要求多个程序同时驻留内存并交替运行，而这种方式不支持。

Multiple-partition allocation
- Hole – block of available memory; holes of various size are scattered throughout memory
- When a process arrives, it is allocated memory from a hole large enough to accommodate it
- Operating system maintains information about:
  - a) allocated partitions
  - 已分配分区 (Allocated partitions)：哪些内存区域正被哪些进程占用。
  - b) free partitions (hole)
  - 空闲分区 (Free partitions/Holes)：哪些区域是空的，以及它们各自的大小和位置。

Dynamic Storage-Allocation Problem

How to satisfy a request of size n from a list of free holes
- First-fit: Allocate the first hole that is big enough
- Best-fit: Allocate the smallest hole that is big enough; must search entire list, unless ordered by size
  - Produces the smallest leftover hole
- Worst-fit: Allocate the largest hole; must also search entire list Produces the largest leftover hole
First-fit and best-fit better than worst-fit in terms of speed and storage utilization
Question: What is the problem with dynamic allocation?
- 碎片化

策略	分配逻辑	碎片特点	效率
First-fit	第一个够大的	产生碎片较快	最高
Best-fit	最小且够大的	产生极小且无用的碎片	中等
Worst-fit	最大的	剩下的孔隙较大，较好利用	较低

Fragmentation

External Fragmentation – total memory space exists to satisfy a request, but it is not contiguous
系统中存在足够的总内存空间来满足一个请求，但这些空间是不连续的，散布在多个小孔隙中。外部碎片在进程内存之外
Internal Fragmentation (refer to the textbook p287) – allocated memory may be slightly larger than requested memory; this size difference is memory internal to a partition, but not being used
分配的时候，我们可能会分配给进程略多的内存，多出来的内存就称为内部碎片
Reduce external fragmentation by compaction
- Shuffle memory contents to place all free memory together in one large block
- Compaction is possible only if relocation is dynamic, and is done at execution time
- 压缩；被压缩的进程无法运行
- I/O problem
- 在进行内存紧凑（移动进程）时，如果进程正在进行 I/O 操作，会产生严重问题。为此有两种常见的解决方案：
  - Latch job in memory while it is involved in I/O
  - 锁定 (Latch)：在进程参与 I/O 期间，将其锁定在内存中，不允许移动。
  - Do I/O only into OS buffers
  - OS 缓冲区：仅将 I/O 数据传输到操作系统的特定缓冲区，而不是直接传到进程内存中，从而允许进程在内存中被移动。
Another solution to external frag. is non-contiguous allocation

Paging

Logical address space of a process can be noncontiguous; process is allocated physical memory whenever the latter is available
Divide physical memory into fixed-sized blocks called frames (size is power of 2, between 512 bytes and 8,192 bytes)
Divide logical memory into blocks of same size called pages
Keep track of all free frames
To run a program of size n pages, need to find n free frames and load program
Set up a page table to translate logical to physical addresses
Internal fragmentation
分配的时候，我们可以将 pages 分配到不连续的 frames 上去，做到不连续分配。
Page 解决了外部碎片，但没有解决内部碎片

对于每个进程来说，需要维护一个 Pagetable。对于 frame 来说，我们需要知道哪些 frame 是空闲的，因此，我们需要维持一个 free frame list(frame table)，用于指示哪些 frame 是空闲的。

Ex1:

对于上面的例子来说，我们知道：
Page size = 4B，推出 Page offset 为 2 位
总共有 4 个 Pages，推出 p = 2
总共有 8 个 frames，推出 f = 3
因此可以推导出：
- logical address 的长度：4 bits
- physical address 的长度：5 bits

Ex2:

对一些概念进行澄清（之前的课程学得太差了）：32位说的是 address 的长度，一个 address 指向 1B 的“内容”。size、空间都是指这些“内容”
对于上面的例子，可以看到，32位的体系架构中，最大的虚拟地址空间就是 4GB。

Hardware Implementation of Page Table

Page table is kept in main memory
Page-table base register (PTBR) points to the page table
PTBR 指向的是页表的起始位置（物理地址；若是虚地址，则还需要从 Pagetable 中找到物理地址，导致错误）
Page-table length register (PTLR) indicates size of the page table
理论上来说，PTBR 和 PTLR 都需要被 PCB 所记录
In this scheme every data/instruction access requires two memory accesses. One for the page table and one for the data/instruction.
任何的地址访问，都需要两次的内存访问，因此需要 TLBs
The two-memory-access problem can be solved by the use of a special fast-lookup hardware cache called associative memory or translation look-aside buffers (TLBs 转换旁视缓冲, 一称快表)
实际上，就是一个 cache，用来存储页表用的。
Some TLBs store address-space identifiers (ASIDs) in each TLB entry – uniquely identifies each process to provide address-space protection for that process

TLB

Address translation (p,d)
If p is in associative register, get frame # out
Otherwise get frame # from page table in memory

Paging Hardware With TLB

以上是 Paging Hardware With TLB
对于每一个进程，都有一个 TLB，因此，在进程切换的时候，TLB 也应该作为 PCB 的一部分被记录存储。如果我们只想让一个 TLB 作用所有的进程，我们应该给 TLB 的条目额外增加一个字段 ASID，用于标识一个进程。

Effective Access Time

Memory Protection in Paged Scheme

Memory protection implemented by associating protection bit with each frame
Valid-invalid bit attached to each entry in the page table:
- “valid” indicates that the associated page is in the process’ logical address space, and is thus a legal page
- “invalid” indicates that the page is not in the process’ logical address space
可以使用一个 valid 字段表示 PTE 是否有效

Shared Pages

Shared code
- One copy of read-only (reentrant) code shared among processes (i.e., text editors, compilers, window systems).
- Shared code must appear in same location in the logical address space of all processes
- 我们可以共享代码。有时候，对于一个代码，我们会开很多个进程来执行。如果每一个进程所对应的代码的物理地址不同的话，是非常浪费空间的。因此，我们就可以让这些进程使用物理内存中同一段代码，这被称为 Shared code
- 采用 shared code 的时候，显然，不同进程的代码的 logical address 是一样的，因为 physical address 是一样的。
  - 地址冲突问题：如果进程 A 将共享库加载在逻辑地址 0x1000，而进程 B 将其加载在 0x5000，那么代码内部的一条指令（例如“跳转到 0x1050”）对进程 A 是正确的，但对进程 B 来说就会跳转到错误的位置。

Private code and data
- Each process keeps a separate copy of the code and data
- The pages for the private code and data can appear anywhere in the logical address space

Hierarchical Page Tables

有时候，如果我们只使用一个页表，它的大小会非常大，这对我们的系统非常不利（比如加载到内存）；因此，我们选择对 page 进行再次的分页。

Break up the logical address space into multiple page tables – to page the page table
A simple technique is a two-level page table

一个例子如下所示：

$p_1$ 是用来访问外部页表的索引，而 $p_2$ 是内部页表的页偏移。

Two-Level Paging Example

A logical address (on 32-bit machine with 1K page size) is divided into:
- a page number consisting of 22 bits
- a page offset consisting of 10 bits
Since the page table is paged, the page number is further divided into:
- a 12-bit page number
- a 10-bit page offset
Thus, a logical address is as follows:

where p1 is an index into the outer page table, and p2 is the displacement within the page of the outer page table

由于地址转换从外向内，这种方案也被称为向前映射页表（forward-mapped page table）。

Three-level Paging Scheme

原理和二级页表的类似，这里就不多解释。

Hashed Page Tables

Common in address spaces > 32 bits
The virtual page number is hashed into a page table. This page table contains a chain of elements hashing to the same location.
- Each element contains Virtual page number, frame no., a pointer to next element
Virtual page numbers are compared in this chain searching for a match. If a match is found, the corresponding physical frame is extracted.
Variation for 64-bit addresses is the clustered page table
Each element refers to several pages (say 16) rather than 1
Useful for sparse address spaces where memory references are scattered and non-contiguous

Inverted Page Table

One entry for each real page of memory
Entry consists of the virtual address of the page stored in that real memory location, with information about the process that owns that page
Decreases memory needed to store each page table, but increases time needed to search the table when a page reference occurs
Use hash table to limit the search to one — or at most a few — page-table entries

Segmentation

段式分配是从程序的角度来考虑的

Segmentation Architecture

接下来介绍一下 Segmentation 的结构。实际上，其与 Paging 的结构是类似的：

Logical address consists of a two tuple:
- <segment-number, offset>,
Segment table – maps two-dimensional physical addresses; each table entry has:
- base – contains the starting physical address where the segments reside in memory
- limit – specifies the length of the segment
Segment-table base register (STBR) points to the segment table’s location in memory
Segment-table length register (STLR) indicates number of segments used by a program;
- segment number s is legal if s < STLR

不同的分段的功能是不一样的，例如，有些段是用来写的，有写是用来读的，有写是用来执行的。针对不同的段，我们都需要标识这个段的功能（或者说是权限）：

Protection
- With each entry in segment table associate:
  - validation bit = 0 => illegal segment
  - read/write/execute privileges
Protection bits associated with segments; code sharing occurs at segment level
Since segments vary in length, memory allocation is a dynamic storage-allocation problem
A segmentation example is shown in the following diagram

Swapping

A process can be swapped temporarily out of memory to a backing store, and then brought back into memory for continued execution
一个进程可以暂时从内存交换到备份存储（backing store），当再次执行的时候交换到内存中。交换有可能让所有的进程的总的物理地址空间超过真实系统的物理地址空间，从而增加了系统多到程序程度（degree of multiprogramming），也就是内存中的进程数量
Standard swapping: moving entire processes between main memory and a backing store
- Entire process image swapped to the backing store

Swapping with paging: pages of a process can be swapped
- A page out operation moves a page from memory to the backing store
- The reverse process is known as a page in
- Much less expensive

The Intel Pentium

根据上面的知识，我们知道，纯粹的 Segmentation 和 paging 都有一些问题。前者会导致外部碎片、hole 的出现，后者会导致内部碎片的出现。

接下来一种架构，采用了将 Segamentation 和 paging 结合在一起的方式，我们称为段页式：

Supports both segmentation and segmentation with paging
CPU generates logical address
- Logical address space divided into local and global partitions.
- LDT vs. GDT tables
- 这两个 table 类似于之前的 table，都是用来存储 entry 的。前者是 local 的，是给段自己用的；后者是 global 的，是可以全局进行访问的。
- Given to segmentation unit
  - Which produces linear addresses
- Linear address given to paging unit
  - Which generates physical address in main memory
  - Paging units form equivalent of MMU

字段	名称
Frame Number	物理页框号
Valid/Invalid Bit	有效位
Dirty Bit	修改位 (脏位)
Reference Bit	引用位 (访问位)
Protection Bits	保护位