-
Notifications
You must be signed in to change notification settings - Fork 79
Description
This is a good project for contributors, and can be used as a student project.
Problem
The MarkCompact plan implements a LISP2-style mark-compact algorithm. It has four major phases:
- Marking. This is well-parallelized.
- Calculating forwarding addresses. Currently this is single-threaded.
- Updating references. This is well-parallelized.
- Compaction. Currently this is single-threaded.
Consequently, the major bottleneck of the MarkCompact plan is the calculation of forwarding addresses and the compaction, as shown in the following timeline plot.
And the STW time gets worse as the heap size increases.
Solution
Flood et al described methods for parallelizing the calculation of forwarding addresses and compaction in their paper Parallel Garbage Collection for Shared Memory Multiprocessors.
Calculating forwarding addresses
To calculate forwarding addresses in parallel, we need to partition the MarkCompactSpace into regions (which can, in theory, be any size, such as 32K blocks and 4M chunks).
- For each live object in each region, we calculate the offset from the destination of the first live object in the block to the destination of the current live object. Meanwhile, we sum up the size of all objects in the current region, including the last object in the region even if it spans multiple regions, as long as its starting address is in this region.
- Iterate through all regions. For each region, record the sum of live object sizes of all regions before that region.
- For each live object in each region, add the recorded sum of the block to the offset of each object, and that forms the forwarding pointer of each object.
The Compressor plan employs similar tricks, but it uses bitmaps, while the LISP2 algorithm has one word before each object.
Compacting
It is infeasible to slide all objects into one end of the heap using multiple threads. Instead, the algorithm of Flood et al divides the space into multiple partitions. (Note that Flood et al calls them "regions" in the paper. But since they are not necessarily power of two in size and alignment, we don't call them "regions". And they don't necessarily have the same size as the regions chosen when calculating forwarding addresses, but usually coarser-grained.) We compact objects in each partition to one end of it, leaving a contiguous range of free space in each partition. We can also do what Flood et al did, i.e. sliding objects of adjacent partitions to opposite ends, leaving one combined gap in the middle. The more partitions we have, the easier it is to parallelize, but the more fragmented the MarkCompactSpace will be. But given that we only allocate objects smaller than 8KB into MarkCompactSpace (objects larger than that threshold go to the LargeObjectSpace), it shouldn't be too problematic to have multiple contiguous ranges of free memory separated by allocated objects.
Things to consider
We need to handle object alignment properly. If the first object of a region needs special alignment, we need to align up the "sum of live objects in previous regions", too.
Despite that MarkComact usually sucks in terms of performance, we keep it because it is a canonical algorithm. Mark-compact collectors have interesting properties such as high space utility and preserving allocation order. We should make it configurable how much we partition the space during forwarding address calculation and compaction. Specifically, if we force it to use only one region/partition, it should behave like the current linear algorithm.