This page describes the functional requirements for supporting the CoreCLR GC in the LLILC codebase, the GC support currently offered by LLVM, the problems that arise mapping the CoreCLR requirements onto what LLVM supports, and suggested plans for building out functionality in LLILC.
The CoreCLR's GC normally runs as a generational, fully relocating, precise, stop-the-world collector. The GC supports weak, pinning and interior pointers. Exterior pointers are not supported. Code may be required to be fully interruptible or may be partially interruptible. The GC in the CoreCLR also supports a conservative mode which greatly simplifies the obligations of the JIT.
For the full CoreCLR GC, a JIT has several responsibilities:
- Use write barriers for stores of GC pointers or value classes with GC pointer members (typically via helper calls to the runtime).
- Determine the set of live GC roots in stack slots and registers. Slots may be untracked or tracked. All slots must be properly initialized, if necessary, before the first safepoint.
- Ensure no exterior pointer is live at a safepoint.
- Track whether each reference is an object, interior, or pinned reference. Special reporting may also be needed for this pointers in methods that have them.
- Ensure that there are enough safepoints along every possible dynamic execution path to avoid long or indefinite GC pauses.
- Determine if there are any regions where GC is not supported, for instance parts of some prologs and epilogs or the middle of some quasi-atomic instruction sequence. Make sure there are no safepoints in such regions.
- In some cases, ensure that special pointers (pinned pointers, this pointers, generic contexts) are kept alive throughout a suitable region of code by reporting them at the requisite safepoints. This keep alive region might extend beyond the normal range determined by liveness analysis.
- Fill in a GC Info object describing the above information for use by the runtime. Note that in the CoreCLR the reporting format is architecture dependent and may require various encoding techniques.
- Ensure that correct transitions are made between managed and unmanaged (i.e. GC-unaware) code.
In conservative mode only 9. applies, but:
- The JIT must ensure that at a safepoint, each live object with a root in the current stack frame is referenced by at least one object or interior pointer (more specifically, this excludes live objects that are only referenced by exterior pointers).
There are some related JIT responsibilities that help ensure type safety for verifiable programs - for instance GC pointer updates in memory must be atomic so that no other thread can observe a partially updated pointer. Those responsibilities are outside the scope of this document.
LLVM currently supports a model for precise GC known as GCRoot. There is a new model under development called Statepoints. We'll discuss each of these in turn.
In the GCRoot
approach, each
location that might contain a GC pointer is reported in the method prolog via
an llvm.gcroot
intrinsic call. This call effectively makes these locations
escape.
Safepoints are modelled as calls to functions that can read and write escaped
memory. Thus no GC pointer can be kept live in registers across a safepoint,
since its value might change across the call. Also, this means only stack
slots are reportable in the GC info. All GCRoot
slots are considered live at
all safepoints, so the GC info reflects that these are untracked locations.
All reported slots must be initialized in the prolog or at a suitable early
point. It is up to the compiler to ensure that all reportable locations that
live across a safepoint are described by a GCRoot
call - this may include
temporaries from complex expression trees. Explicit nulling may be needed
after temporaries die to ensure these temporary slots don't hold onto objects
longer than necessary. It's up to the IR creator to ensure all this is done
correctly.
Eventually the llvm.gcroot
is lowered out as a normal alloca, and there
doesn't seem to be any special protection left after that point.
GCRoot
includes support for late insertion of safepoints. However this
functionality appears to not be fully implemented. GCRoot
supports custom
encoding of the GC information.
Philip Reames at Azul Systems has written up a couple of objections to the
GCRoot
approach and suggested an alternative he called
Statepoints
.
Part of Reames' motivation was better support for a precise, fully relocating
collector, something that is also required for CoreCLR's GC. Given that
Statepoints
provide a useful superset of what GCRoot provides, and that
the work is ongoing and still a bit malleable, it seems prudent to build upon
this work for supporting the CoreCLR GC in LLILC.
Statepoints
essentially allow for tracked location reporting. Each safepoint
can describe a different set of locations to report to the GC, including
registers. The per-method GC read-write potential at safepoints is made
manifest in the SSA form by explicit addition of the live-in GC pointers and
live-out GC pointers. Safepoints also allow reporting derived pointers.
The reliance on SSA for GC liveness reporting at safepoints means that GC
pointers found on the stack at safepoints (say from fields in value types)
cannot be reported.
There is now experimental LLVM support for parts of the Statepoint
proposal.
Current plans for finishing the work would not include custom formatting;
instead the results appear in the LLVM
StackMap format.
Statepoints
also allow for late insertion of safepoints, though how this
happens is still being worked out. One aspect of this is to use LLVM's
address spaces to distinguish GC pointers from ordinary pointers. In
particular, GC pointers are required to be in addrspace(1)
or be in an
address space with some distinguishing GC attribute. Then identifying the set
of necessary safepoints and liveness analysis could drive wiring up the
SSA uses and definitions. We have adopted this address space convention in
LLILC.
Statepoints
also support the notion of statepoints which are transitions
between code that uses different GC modes ("GC transitions"). LLILC can
(and does) make use of GC transitions to support calls from managed to
unmanaged code.
Given the aspects of GC that LLVM supports and the features that the CoreCLR GC requires, here are the open issues in using LLVM to generate correct and complete GC Info.
GC pointers can get lost if they're cast to non-pointer types and then cast
back, or cast to non-GC pointer address spaces and then cast back. However,
the consistent use of addrspace(1)
for GC pointers should inhibit most
of this, since LLVM is aware that pointer representations may change when a
pointer is cast from one address space to another, so these casts must be
done explicitly via addrspacecast
. Violations are flagged by IR constructors
and checkers. So for the most part we'll just need to audit places where an
ordinary pointer or non-pointer is cast to addrspace(1)
- there should
be very few legitimate cases of this.
Once the GC dependence of GC pointers on safepoints is made explicit in the IR, a fragile invariant arises. A new derived pointer can be created before a safepoint and used after it without apparent error. It seems plausible that we can check for this by running a second GC pointer liveness pass downstream after the point safepoints are made manifest; in this framework, no GC pointer should be live across a safepoint.
Value types in the CLR may contain GC pointer-valued fields. The current
Statepoint
proposal does not support reporting GC pointers from non-SSA
sources.
While the use of addrspace (1)
or similar convention helps locate GC
pointers, it does not distinguish between
object and interior pointers
We need to find a way to do this since we must report these differently.
Initially, at some performance cost, we can report all references as interior pointers.
Note that it is not possible in general to use derived pointer reporting for interior pointers. Interior pointers may be passed as arguments to calls and returned as results by function calls, and in these cases the callee has no way of reporting the base pointer. Furthermore, since the default assumption for parameters is that they are object pointers, we'll need some sort of metadata or similar to distinguish cases where they're actually interior pointers.
To report interior pointers the GC can keep track of starting addresses of all objects such that if you point within an object it can find the base address by searching for the nearest pointer that represents the start of the object.
The case of exterior pointers requires additional information to determine which object the pointer is pointing to, since if they point into an object at a higher address the GC can't tell which object is that pointer's base.
It seems likely that LLVM's optimization passes might create exterior pointers in places (e.g. pointing before or after the object). The CoreCLR GC can't tolerate these, so we'll need to ensure that no exterior pointer is live at a safepoint.
Pinned pointers must be reported live and pinned for at least the duration of the pin. If a pinned pointer is copied, at least one of the references must be reported as pinned at each safepoint in the pinned range.
Translating these constraints into IR has proven tricky in other compilers we've worked with. The problem with pinned pointers from the compiler standpoint is that the uses of the pinned objects aren't data-dependent on them. Pinned is an attribute of a local (of managed pointer type) in the MSIL. The sequence is something like
pinnedLocal.mp = ...
otherLocal.up = convert pinnedLocal.mp
... uses of otherLocal.up ...
pinnedLocal.mp = null
The constraint is something like "given any use that is transitively
data-dependent on otherLocal
(with maybe a special case to break out if
otherLocal
gets converted back to a reported pointer), if that use
gets moved passed a store to pinnedLocal
then it can't further get moved
past a safepoint". We've never seen a clearly written contract about this
(that takes into account optimizer freedom), so we don't know what's
guaranteed if somebody takes the address of pinnedLocal
and the compiler
can't tell which calls or indirect stores might or might not update it.
As noted, the current Statepoint
work does not support custom encoding
formats. Adding this
seems straightforward, but we might push to have the
base layer provide a more customizable foundation.
Neither GCRoot
nor Statepoints
support fully interruptible GC Info.
We'll have to find out how crucially the CoreCLR depends upon this feature.
It seems plausible we can simply support partially interruptible GC.
Depending on how EH features are implemented, we may need custom support for reporting GC info from funclets.
Here are the open issues in using LLVM to generate performant code in the presence of GC references and reporting rules. Most of these are not urgent but some might be good jumping-off points for community engagement.
Untracked locations require zero-initialization in the prolog even if the slots being initialized are conditionally live at that point. In certain pathological cases (e.g. a method with a large switch where each switch arm introduces GC reference locals) the overhead of this zeroing can become a serious performance issue.
Ideally we'd be able to keep GC pointers in registers across calls. The current
Statepoint
work supports this in abstract but in practice all the GC
pointers are spilled to memory at call sites.
Naturally there are some follow-on complications.The GC must be aware of the spill locations of callee save registers, since they might contain GC pointers from the caller and the location of these is only known to the callee. Location info for spills may be obtainable from the regular unwind data or might require a custom reporting format.
If a callee save is spilled to the stack across a safepoint, the callee save must be restored from the stack, even if the register spilled has not been modified along the path leading to the restore. This handles the case where the register is not live at the safepoint, and the spilled value is updated by the GC. For example:
spill RBX
br i1 %p, label %1, label %2
;<label>:1
...modify RBX
br label %3
;<label>:2
call[safepoint] f()
br label %3
;<label>:3
restore RBX
ret
Here RBX
is not live at the call to f()
and so does not need to be
reported at the safepoint. An optimizer might choose to tail-duplicate
leading to the following code, where at label 2
it appears the restore
of RBX
is unnecessary.
spill RBX
br i1 %p, label %1, label %2
;<label>:1
...modify RBX
restore RBX
ret
;<label>:2
call[safepoint] f() // GC might modify spilled RBX here
restore RBX // so this restore is not redundant
ret
In methods that might call into unmanaged code, the compiler must ensure that either a. no callee saves contain GC references at unmanaged call sites, or b. any callee saves that do contain GC references at such call sites are spilled to known locations that can be found and (if the gc is relocating) reliably updated.
Note this might mean spilling the callee save even if it's not used in the method, since the callee has no way of knowing if that register contains a GC pointer or not (preferably shrink wrapping the spill so it happens only if the unmanaged calls are actually going to happen).
Ideally GC pointers that require zero initialization in the prolog are laid out contiguously on the stack so that they can be zeroed by a single memset or similar.
Ideally GC pointers that require stack locations would be packed into the smallest number of stack slots possible, both to minimize frame size and to reduce the amount of up-front zeroing that is needed.
Initialization of GC pointers to null and spills of callee saves should be deferrable.
Both the GCRoot
and Statepoint
approaches rely on broad alias escapes to
indicate that GC references might change. In GCRoot
this is used to indicate
all GC references (including those on the local frame) while in Statepoint
it
is used currently just for heap and static references. A more refined approach
would be to have the alias model establish a distinguished region of
relocatable references and have safepoints only alias those. This would allow
calls that have side effects but can't trigger GC to bypass safepoint reporting.
It is common to assign null to a GC pointer when it's no longer needed locally, to avoid holding onto objects that aren't needed. Without some precautions these stores of null may appear as dead stores since there are typically no apparent reads downstream. The code generator can remove the store and trim the lifetime for tracked locations, but must keep the store for untracked locations.
Locations known to contain null pointers don't need to be reported to the GC.
Write barriers can impose significant performance overhead. In some cases this overhead can be mitigated by optimizations:
- Removing unnecessary write barriers - for instance, no barriers are necessary on newly created objects that haven't crossed a safepoint. Note this will entail code-motion restrictions; a store with an elided barrier cannot cross a safepoint.
- Write barrier calls may be inlined or tailored to mesh with the allocator's needs. The actual barrier typically has fairly low register cross section, so customizing its register usage during register allocation is appealing. If the barrier is inlined, special care must be taken to prevent subsequent code motion.
- Merged barriers - the underlying GC may track suspect regions of older generations at page or larger granularity, so stores to contiguous or locally dense regions may be handled with a single write barrier. This can be especially useful in copy loops. No safepoint can appear between the actual writes and the ultimate reporting barrier.
- Storing null to a location may not require a write barrier.
- It is often beneficial to defer lowering of stores requiring write barriers into helper calls so that upstream optimizations can deal with them more or less as normal stores.
Once safepoints are inserted, a number of legal optimizations may no longer be possible. For instance:
- The non-GC fields of a class are not modified at safepoints, so it should be safe to CSE loads of such fields across safepoints.
- Subsequent optimization may dead-code a GC pointer use below a safepoint. We'd like to be able to propagate this above the safepoint to remove even more code, but would be blocked by the apparent use and definition at the safepoint.
These considerations may bias us towards late insertion of safepoints; however, it may be advisable to insert safepoints early in some places (loops, for instance) so that the loop optimizer can observe the need for a safepoint and plan accordingly.
At this point we are very early in the development process, so parts of these plans are tentative and somewhat vague. We welcome the advice and participation of the community in helping us to refine and improve our plans.
We currently create a GC info with no locations reported. We use
addrspace(1)
to mark GC pointers in the IR, but will have no way of knowing
if we're doing it properly. We'll only run small tests that do not trigger GCs.
This will be sufficient for building up most of the other Jit functionality.
Initially, we'll enable the conservative mode of the CoreCLR GC. We've taken a preliminary look and this seems to work well enough, but we need to do further vetting. Alternatively we can adjust the GC's segment size to forestall GC as long as possible. This should allow us to run a substantial number of tests.
If for some reason this doesn't pan out, we can also look into modifying the GC segements sizes to defer GC as long as possible.
This will be an initial implementation using Statepoints
. We'll have to
implement custom encoding (We can leverage the
GC encoding library
that the CoreCLR provides for this) and
find some method
of distinguishing interior pointers from object pointers. This will possibly
be a dataflow analysis along with some hinting from LLVM metadata or similar.
We'll have to find some way to report
GC fields from aggregates,
maybe by blending some of the GCRoot
support back in for such slots.
As needed we plan to help fill in missing pieces of
the experimental implementation in LLVM. For instance, we'll probably implement
our take on the
late insertion of safepoints.
And we'll likely try and write a
downstream checker
that runs at encoding time
or similar to verify that no unreported GC pointers exist.
At this point we hope to have a functionally correct GC with partially interruptible GC info reported back to the CoreCLR. We can start running GC stress and related tests and see how robust this solution is.
Based on our experiences with Statepoint V1 we'll likely want to work with
the community to enhance the Statepoint
work. The specifics here will
depend on what we've learned and what problems we encounter.
CoreCLR requires the generation of GC Tables that contain pointer
liveness information for each method. To achieve this, we read
the liveness information from the __llvm.stackmaps
section, and
translate it to the CoreCLR format. We obtain the information by
post-processing LLVM generated binary, rather than adding a new
StackMap generator (for CoreCLR format) to LLVM itself.
While this involves a two step translation, it helps avoid:
- Compatibility conflicts because of LLVM core having to change whenever there is a change in CoreCLR.
- Bugs because of changes not correctly propagated to multiple stack-map generators in LLVM.
Generating GC Tables involves the following steps:
- Enable StackMap section generation for COFF in LLVM.
- Parse the
__llvm.stackmaps
section. - Collect certain information about the stack frame (ex: which of the callee save registers are saved) not available in the StackMap section by parsing other sections, if necessary.
- Encode information using CoreCLR's GCInfo Encoder
- Report each call-site (GC safepoint).
- Assign slots IDs for each unique register and stack locations.
- For each slot, report liveness based on the location records
in
__llvm.stackmaps
section.
CoreCLR permits reportng certain slots as "untracked" GC-pointers, for stackmap-size reduction or throughput reasons. However, LLVM currently does not support untracked reporting -- stack-locations cannot be assumed to remain live through entire function, due to transformations like stack coalescing. Therefore, LLILC will report the liveness of all slots as tracked pointers.
The requirements imposed on a code generator by the CoreCLR present new challenges for GC reporting in LLVM, both for correct reporting and for optimization in the presence of GC reporting requirements.
LLILC is a new JIT for the CoreCLR that will leverage LLVM for code generation. We plan to work with the community to enhance LLVM to become a robust and performant platform for managed code generators like LLILC.
A safepoint is a point in code where a GC can safely occur.
A method is partially interruptible if it contains at least one safepoint. A method (or region of a method) is fully interruptible if every instruction boundary is a safepoint.
A GC algorithm works by determining the transitive closure of objects reachable via GC pointers (or GC references) from a set of GC roots. GC roots are typically found in distinct memory regions: static memory, the runtime stacks and register state of active threads, and sometimes in heap memory.
A GC algorithm is conservative if the set of locations it scans for GC roots is a superset of all the possible locations that may contain GC roots. A GC algorithm is precise (or accurate) if the locations is scans for GC roots must exactly match all the possible locations for GC roots. An algorithm may be conservative in some memory regions and precise in others.
A GC algorithm is relocating if during a GC, GC pointers may be updated to refer to the new location of an object. An algorithm update pointers found in certain regions but not others. For instance, GC pointers from the stack might not be updated while GC pointers in the heap might be updatable. If the algorithm relocates pointers from all regions, it is fully relocating. When a GC is relocating pointers from some memory region, it necessarily implies the GC reporting is precise in that region.
A weak pointer or weak reference is a GC pointer that does not keep an object live. These can be set to null during GC, even if a GC is not relocating.
A stackmap or GC map info is used to describe the set of root locations to scan at a safepoint. The map may describe stack locations (via base register + offset) or machine registers. Other information is used to establish the sets of root locations on the heap and in the stack; the code generator is usually not involved in this. The GC Info for a method is the set of all stackmaps and other GC reporting information.
An object or base pointer is a pointer to the base address of a GC collectable object. A managed or interior pointer is a pointer to an address within an object or possibly a pointer to the address just beyond the object. A derived pointer is a pointer that is offset from an object pointer but is logically related to the object pointer. A derived pointer may or may not be an interior pointer. We will use the term exterior pointer to refer to the case where the derived pointer is not an interior pointer -- hence an exterior pointer points beyond the logical extent of the object. Note dereferencing exterior pointers can violate type safety.
A GC collectable object is pinned at a safepoint if it is not allowed to be relocated if a GC occurs at that safepoint.
A root location within a method is tracked if it can be reported as a root at some safepoints and not reported at others (where the determination of where it must be reported being driven by liveness analysis). If no liveness analysis is performed on the root location, or the location is live across the entire method (and hence would be reported in all safepoints) then the location is effectively untracked. Locations must generally be initialized before the first safepoint in the method so that the GC does not encounter garbage values. Untracked locations are typically initialized in the method's prolog. Untracked locations may be reported via separately from safepoints in GC Info.
A GC may support collecting just a subset of objects. Typically this is done by segregating objects by age (a generational GC). This usually requires special attention when writing GC references to memory, via write barriers.
Some GC algorithms support GC running concurrently with code execution. Often this requires special attention to be paid when GC references are read, via read barriers.
To ensure that a GC can be performed promptly when needed, the code generator must ensure that the number of instructions executed between safepoints is bounded. Often this means the code generator will insert additional safepoints (aka GC probes or GC polls) at places in the code.
A managed method is one whose code is created by the JIT or similar compiler that is aware of the special requirements imposed by GC (and more generally the requirements of the managed runtime environment).
A native (or unmanaged) method is a code sequence that is not aware of the special requirements for GC reporting. Typically these are assembly or C++ methods or similar.
A PInvoke (Platform Invoke) is a call to native code from a managed method. A Reverse PInvoke is a call to a managed method from native code.
No. | Implementation | Testing | Issue | Status |
---|---|---|---|---|
1 | Insert GC Safepoints: Run the PlaceSafepoints and RewriteSafepointsForGC phases before Code generation, and ensure that statepoints are inserted and lowered correctly | LLILC tests pass with Conservative GC | 32 | Completed |
2 | Bring GCInfo library to LLILC: Use the GCInfo library to encode Function size correctly; no live pointers reported at GC-safe points | LLILC tests pass with Conservative GC | 30 | Completed |
3 | Report GC liveness: Encode GC pointer liveness information in the CLR format using the GC-Encoding library | A few functions compiled by LLILC with correct GCInfo | 31 | Completed |
4 | Test Pass | CoreCLR tests pass with Precise GC | 670 | Completed |
5 | Add GC-specific stress tests | All existing and new tests pass | 696 | In Progress |
6 | GC Stress testing | Run the LLILC tests in GCStress mode; some GCStress testing running regularly in the lab | ||
7 | Special reporting for pinned pointers | Code with pinned pointers handled by LLILC | 29 | Completed |
8 | Support aggregates containing GC pointers | Code with GC-aggregates handled by LLILC | 33 | Completed |
9 | Fully-Interruptible code: Investigate whether fully interruptible code should be supported | Test and GCStress Pass | 473 | |
10 | Lower Write barriers to Calls late | Test and GCStress Pass | 471 | |
11 | Place Safepoint-polls only where necessary for CoreCLR runtime | Test and GCStress Pass | 425 | |
12 | Track GC-pointers in registers | Test and GCStress Pass | 474 | |
13 | Implement GC Checker | Test and GCStress Pass | 34 | |
14 | Identify Object and Managed pointers differently | Test and GCStress Pass | 28 | |
15 | Implement necessary support to enable Precise GC when LLVM optimizations are turned on for LLILC | Test and GCStress Pass in an optimized LLILC build |