|
1 | 1 | [[intro]] |
2 | 2 | == Introduction |
3 | 3 |
|
4 | | -Significant production deployed software is written in languages which provide |
5 | | -powerful primitives (like C/C++) to software developers to manage memory. |
| 4 | +Significant production software is written in languages which provide |
| 5 | +powerful memory management primitives (like C/C++). |
6 | 6 | Software is written in collaborative fashion and on tight deadlines and bugs do |
7 | | -get introduced in software development. Such bugs sometimes can lead to issues |
8 | | -like information disclosures, memory leak or memory corruptions commonly termed |
9 | | -as `memory safety` issues. Underlying cause of memory safety issues can be |
10 | | -anything but eventually such issues surface as invalid memory accesses. |
| 7 | +get introduced. |
| 8 | +A subset of these bugs introduce issues commonly termed as "memory safety" |
| 9 | +issues. The root causes of memory safety issues are numerous but eventually |
| 10 | +such issues surface as invalid memory accesses with regards to the |
| 11 | +language specification. |
11 | 12 |
|
12 | | -From program's perspective access to memory happens via pointers and life cycle |
13 | | -of pointers in a program is managed by following |
| 13 | +Programs use pointers to access memory. We refer to the parts of the programs |
| 14 | +in which accessing an object is valid as its "lifetime". An object's |
| 15 | +lifetime depends on the region of memory it is stored on: |
14 | 16 |
|
15 | | -* Memory allocator - A memory allocator uses operating system facilities to |
| 17 | +* **Heap**: A memory allocator uses operating system facilities to |
16 | 18 | create large memory mappings and expose APIs so that software can manage |
17 | 19 | (allocate and free) memory at their convenience. As an example `malloc` |
18 | | - allocates memory and returns a pointer to allocated memory and `free` frees |
19 | | - the memory so that allocator can recycle that memory. Once a pointer is |
20 | | - created inside a function, it can pass around this pointer to other functions |
21 | | - or libraries and even after memory is freed, passed copies of pointer may |
22 | | - exist in other software components (due to bug) and if accessed can lead to |
23 | | - invalid memory access issues. |
| 20 | + allocates a specified number of bytes and returns a pointer to the |
| 21 | + allocated memory; `free` frees the memory so that allocator can recycle |
| 22 | + that memory. Depending on the operating system, there might be many similar |
| 23 | + APIs (e.g. `calloc`, `memalign`, etc.), but they are always paired with a |
| 24 | + corresponding `free` call. |
| 25 | + The lifetime of an object on the heap is between `malloc` (or similar) and |
| 26 | + `free`. |
24 | 27 |
|
25 | | -* Stack - Program starts execution with an allocated stack by underlying |
26 | | - execution environment which also ensures that stack pointer register is set |
27 | | - to base of stack memory and compiler derives all memory pointers for stack |
28 | | - objects (inside a function) from the stack pointer register. Program may be |
29 | | - written in a way which can pass around stack pointers to other functions. If |
30 | | - a passed copy of such a stack pointer is saved by another function and can be |
31 | | - accessed by it, then it may lead to stack invalid memory access issues. |
| 28 | +* **Stack**: The operating system allocates a region to be used as the stack of |
| 29 | + the program. When a function is called, a stack frame is created at the top |
| 30 | + of the stack. This is used to hold local variables, which are accessed |
| 31 | + relative to the stack pointer, which designates the top of the stack. |
| 32 | + Upon return from the function, the frame is popped from the stack by |
| 33 | + adjusting the stack pointer. Within a stack frame, the compiler can reuse |
| 34 | + the same memory for local variables that have non-overlapping lifetimes |
| 35 | + (known as stack coloring, implemented e.g. in LLVM StackColoring |
| 36 | + cite:[LLVM_StackColoring]). The lifetime of an object on the stack is |
| 37 | + governed by the programming language's scoping rules (e.g. C++ scoping rules |
| 38 | + cite:[CppScope]). |
32 | 39 |
|
33 | | -* Globals - Pointers to global memory are valid throughout the runtime of the |
34 | | - program and thus invalid memory access to global memory is not possible. |
35 | | - Although if a program is accessing global memory in a way that can lead to |
36 | | - out of bounds access, that can be leveraged to access address space of the |
37 | | - program which otherwise wasn’t possible. |
| 40 | +* **Globals**: Storage for global variables is allocated by the operating system |
| 41 | + on process startup (or when the dynamic object is loaded). Generally, the |
| 42 | + storage for globals is destroyed on process shutdown (depending on the OS, |
| 43 | + `dlclose` might cause it to be destoyed earlier). |
| 44 | + The lifetime of a global ends when the process is terminated, or the dynamic |
| 45 | + object is unloaded. |
38 | 46 |
|
39 | | -Two prominent patterns of invalid memory accesses are following (but not |
40 | | -limited to) |
| 47 | +In addition, an object has a size that represents the amount of memory |
| 48 | +allocated for it, and except for reassignment, a pointer must always point to |
| 49 | +memory of the object it was previously assigned to point to. |
41 | 50 |
|
42 | | -* Stale pointer accesses (temporal): Memory was freed (heap) or became out of |
43 | | - scope (stack) and due to a bug in software, pointer is still kept around and |
44 | | - reachable via some control flow. This can be termed as use after free or out |
45 | | - of scope accesses. With respect to heap memory such issues are commonly |
46 | | - called `use after free` (UAF). In case of stack, such issues are termed as |
47 | | - `stack use after return`, `stack use after scope`. |
| 51 | +Two prominent patterns of invalid memory accesses are: |
48 | 52 |
|
49 | | -* Incorrect pointer construction (spatial): Due to bug in bounds checking, an |
50 | | - incorrect pointer can be constructed and thus leading to invalid memory |
51 | | - access. Similarly due to a bug in arithmetics (integer overflow) or bad |
52 | | - casting (signed to unsigned), bad offsets can be generated and eventually |
53 | | - leading to incorrect pointer construction and out of bounds accesses. More |
54 | | - commonly they are termed `stack` or `heap buffer overflows`. |
| 53 | +* **Temporal**: A pointer is dereferenced past the lifetime of the object |
| 54 | + it refers to. With respect to heap memory, such issues are commonly |
| 55 | + called "use after free" (UAF). In case of stack, such issues are termed as |
| 56 | + "stack use after return", "stack use after scope". |
55 | 57 |
|
56 | | -To catch such inadvertent programming errors, performing memory validity check |
57 | | -at access time is required. When the memory is valid to access (as an example |
58 | | -after allocation in `malloc` or on a function call for stack locals), |
59 | | -software can associate a `tag` value with the memory, store assigned `tag` in a |
60 | | -special memory (tag storage) and annotate that `tag` in high unused bits of the |
61 | | -pointer. This ensures that whenever the pointer is passed around, pointer is |
62 | | -always tagged with `tag` assigned at the time when memory became valid. When |
63 | | -memory is not allowed to be accessible (as an example in when memory is freed |
64 | | -using `free` or stack locals are out of scope after a function has returned) |
65 | | -then software can change assigned `tag` in tag storage. On a memory access (i.e |
66 | | -pointer dereference), `tag` in pointer is checked against stored `tag` for that |
67 | | -memory and if tags do not match then it is due to a memory safety bug in |
68 | | -program. |
| 58 | +* **Spatial**: A pointer outside of the bounds of the object it refers to |
| 59 | + be created, leading to invalid memory access when dereferenced. This is |
| 60 | + commonly referred to as a "buffer overflow" (sometimes also "buffer |
| 61 | + underflow"), or "out of bounds". |
69 | 62 |
|
70 | | -Existing software mechanisms using compiler assisted software instrumentation |
71 | | -and runtime support exist: |
| 63 | +To catch these issues, we can check memory accesses for validity at runtime. |
| 64 | +We do this by employing a lock and key scheme. Memory is assigned a memory tag |
| 65 | +(lock). Each pointer is assigned a pointer tag (key) in its unused top bits. |
| 66 | +On access, the key needs to match the lock, or the program is terminated. |
72 | 67 |
|
73 | | -* HWAddressSanitizer (HWAsan) cite:[HWASAN] which uses pointer-masking. |
74 | | -* AddressSanitizer (ASan) cite:[ASAN] which is fully implemented in software. |
| 68 | +When the lifetime of an object starts (e.g. `malloc` is called), we assign the |
| 69 | +same tag as a memory tag (lock) for the `size` bytes representing the object, |
| 70 | +and pointer tag (key). Subsequent in-bounds accesses using that pointer are |
| 71 | +valid. When the lifetime of the object ends (e.g. `free` is called), we change the |
| 72 | +memory tag (lock). Subsequent accesses through the pointer (which remains |
| 73 | +unchanged) will be invalid and terminate the program. |
| 74 | + |
| 75 | +Existing software mechanisms using compiler instrumentation and runtime |
| 76 | +support exist: |
| 77 | + |
| 78 | +* HWAddressSanitizer (HWAsan) cite:[HWASAN] which uses pointer-masking to |
| 79 | + implement the idea explained above. |
| 80 | +* AddressSanitizer (ASan) cite:[ASAN] which is fully implemented in software, |
| 81 | + but in a different way to the idea above. |
75 | 82 |
|
76 | 83 | However software approach has significant cost from performance perspective and |
77 | 84 | are mostly limited to unit test, developer test or special test harness (like |
|
0 commit comments