Skip to content

Commit 201d74e

Browse files
authored
Reword intro. (#48)
This adds a bit more precision and logical flow to the document.
1 parent 361a546 commit 201d74e

File tree

2 files changed

+77
-58
lines changed

2 files changed

+77
-58
lines changed

src/mte.bib

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,3 +21,15 @@ @electronic{ZIMOP
2121
url = {https://github.com/riscv/riscv-isa-manual/blob/main/src/zimop.adoc},
2222
year = {}
2323
}
24+
25+
@electronic{LLVM_StackColoring,
26+
title = {LLVM StackColoring.cpp},
27+
url = {https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/StackColoring.cpp},
28+
year = {}
29+
}
30+
31+
@electronic{CppScope,
32+
title = {cppreference.com - Scope},
33+
url = {https://en.cppreference.com/w/cpp/language/scope.html},
34+
year = {}
35+
}

src/mte_intro.adoc

Lines changed: 65 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -1,77 +1,84 @@
11
[[intro]]
22
== Introduction
33

4-
Significant production deployed software is written in languages which provide
5-
powerful primitives (like C/C++) to software developers to manage memory.
4+
Significant production software is written in languages which provide
5+
powerful memory management primitives (like C/C++).
66
Software is written in collaborative fashion and on tight deadlines and bugs do
7-
get introduced in software development. Such bugs sometimes can lead to issues
8-
like information disclosures, memory leak or memory corruptions commonly termed
9-
as `memory safety` issues. Underlying cause of memory safety issues can be
10-
anything but eventually such issues surface as invalid memory accesses.
7+
get introduced.
8+
A subset of these bugs introduce issues commonly termed as "memory safety"
9+
issues. The root causes of memory safety issues are numerous but eventually
10+
such issues surface as invalid memory accesses with regards to the
11+
language specification.
1112

12-
From program's perspective access to memory happens via pointers and life cycle
13-
of pointers in a program is managed by following
13+
Programs use pointers to access memory. We refer to the parts of the programs
14+
in which accessing an object is valid as its "lifetime". An object's
15+
lifetime depends on the region of memory it is stored on:
1416

15-
* Memory allocator - A memory allocator uses operating system facilities to
17+
* **Heap**: A memory allocator uses operating system facilities to
1618
create large memory mappings and expose APIs so that software can manage
1719
(allocate and free) memory at their convenience. As an example `malloc`
18-
allocates memory and returns a pointer to allocated memory and `free` frees
19-
the memory so that allocator can recycle that memory. Once a pointer is
20-
created inside a function, it can pass around this pointer to other functions
21-
or libraries and even after memory is freed, passed copies of pointer may
22-
exist in other software components (due to bug) and if accessed can lead to
23-
invalid memory access issues.
20+
allocates a specified number of bytes and returns a pointer to the
21+
allocated memory; `free` frees the memory so that allocator can recycle
22+
that memory. Depending on the operating system, there might be many similar
23+
APIs (e.g. `calloc`, `memalign`, etc.), but they are always paired with a
24+
corresponding `free` call.
25+
The lifetime of an object on the heap is between `malloc` (or similar) and
26+
`free`.
2427

25-
* Stack - Program starts execution with an allocated stack by underlying
26-
execution environment which also ensures that stack pointer register is set
27-
to base of stack memory and compiler derives all memory pointers for stack
28-
objects (inside a function) from the stack pointer register. Program may be
29-
written in a way which can pass around stack pointers to other functions. If
30-
a passed copy of such a stack pointer is saved by another function and can be
31-
accessed by it, then it may lead to stack invalid memory access issues.
28+
* **Stack**: The operating system allocates a region to be used as the stack of
29+
the program. When a function is called, a stack frame is created at the top
30+
of the stack. This is used to hold local variables, which are accessed
31+
relative to the stack pointer, which designates the top of the stack.
32+
Upon return from the function, the frame is popped from the stack by
33+
adjusting the stack pointer. Within a stack frame, the compiler can reuse
34+
the same memory for local variables that have non-overlapping lifetimes
35+
(known as stack coloring, implemented e.g. in LLVM StackColoring
36+
cite:[LLVM_StackColoring]). The lifetime of an object on the stack is
37+
governed by the programming language's scoping rules (e.g. C++ scoping rules
38+
cite:[CppScope]).
3239

33-
* Globals - Pointers to global memory are valid throughout the runtime of the
34-
program and thus invalid memory access to global memory is not possible.
35-
Although if a program is accessing global memory in a way that can lead to
36-
out of bounds access, that can be leveraged to access address space of the
37-
program which otherwise wasn’t possible.
40+
* **Globals**: Storage for global variables is allocated by the operating system
41+
on process startup (or when the dynamic object is loaded). Generally, the
42+
storage for globals is destroyed on process shutdown (depending on the OS,
43+
`dlclose` might cause it to be destoyed earlier).
44+
The lifetime of a global ends when the process is terminated, or the dynamic
45+
object is unloaded.
3846

39-
Two prominent patterns of invalid memory accesses are following (but not
40-
limited to)
47+
In addition, an object has a size that represents the amount of memory
48+
allocated for it, and except for reassignment, a pointer must always point to
49+
memory of the object it was previously assigned to point to.
4150

42-
* Stale pointer accesses (temporal): Memory was freed (heap) or became out of
43-
scope (stack) and due to a bug in software, pointer is still kept around and
44-
reachable via some control flow. This can be termed as use after free or out
45-
of scope accesses. With respect to heap memory such issues are commonly
46-
called `use after free` (UAF). In case of stack, such issues are termed as
47-
`stack use after return`, `stack use after scope`.
51+
Two prominent patterns of invalid memory accesses are:
4852

49-
* Incorrect pointer construction (spatial): Due to bug in bounds checking, an
50-
incorrect pointer can be constructed and thus leading to invalid memory
51-
access. Similarly due to a bug in arithmetics (integer overflow) or bad
52-
casting (signed to unsigned), bad offsets can be generated and eventually
53-
leading to incorrect pointer construction and out of bounds accesses. More
54-
commonly they are termed `stack` or `heap buffer overflows`.
53+
* **Temporal**: A pointer is dereferenced past the lifetime of the object
54+
it refers to. With respect to heap memory, such issues are commonly
55+
called "use after free" (UAF). In case of stack, such issues are termed as
56+
"stack use after return", "stack use after scope".
5557

56-
To catch such inadvertent programming errors, performing memory validity check
57-
at access time is required. When the memory is valid to access (as an example
58-
after allocation in `malloc` or on a function call for stack locals),
59-
software can associate a `tag` value with the memory, store assigned `tag` in a
60-
special memory (tag storage) and annotate that `tag` in high unused bits of the
61-
pointer. This ensures that whenever the pointer is passed around, pointer is
62-
always tagged with `tag` assigned at the time when memory became valid. When
63-
memory is not allowed to be accessible (as an example in when memory is freed
64-
using `free` or stack locals are out of scope after a function has returned)
65-
then software can change assigned `tag` in tag storage. On a memory access (i.e
66-
pointer dereference), `tag` in pointer is checked against stored `tag` for that
67-
memory and if tags do not match then it is due to a memory safety bug in
68-
program.
58+
* **Spatial**: A pointer outside of the bounds of the object it refers to
59+
be created, leading to invalid memory access when dereferenced. This is
60+
commonly referred to as a "buffer overflow" (sometimes also "buffer
61+
underflow"), or "out of bounds".
6962

70-
Existing software mechanisms using compiler assisted software instrumentation
71-
and runtime support exist:
63+
To catch these issues, we can check memory accesses for validity at runtime.
64+
We do this by employing a lock and key scheme. Memory is assigned a memory tag
65+
(lock). Each pointer is assigned a pointer tag (key) in its unused top bits.
66+
On access, the key needs to match the lock, or the program is terminated.
7267

73-
* HWAddressSanitizer (HWAsan) cite:[HWASAN] which uses pointer-masking.
74-
* AddressSanitizer (ASan) cite:[ASAN] which is fully implemented in software.
68+
When the lifetime of an object starts (e.g. `malloc` is called), we assign the
69+
same tag as a memory tag (lock) for the `size` bytes representing the object,
70+
and pointer tag (key). Subsequent in-bounds accesses using that pointer are
71+
valid. When the lifetime of the object ends (e.g. `free` is called), we change the
72+
memory tag (lock). Subsequent accesses through the pointer (which remains
73+
unchanged) will be invalid and terminate the program.
74+
75+
Existing software mechanisms using compiler instrumentation and runtime
76+
support exist:
77+
78+
* HWAddressSanitizer (HWAsan) cite:[HWASAN] which uses pointer-masking to
79+
implement the idea explained above.
80+
* AddressSanitizer (ASan) cite:[ASAN] which is fully implemented in software,
81+
but in a different way to the idea above.
7582

7683
However software approach has significant cost from performance perspective and
7784
are mostly limited to unit test, developer test or special test harness (like

0 commit comments

Comments
 (0)