When you compile your .NET program as Native AOT, the whole program and its runtime data structures get compiled to native representation. There’s no virtual machine to offer various services at runtime. This constrains what the program can do and requires giving up certain conveniences of .NET, such as the ability to load new types and assemblies at runtime (i.e. Reflection.Emit and support for managed plug-ins). But while you lose some things, you gain other things: faster startup, less memory use, smaller size, and the ability to optimize your app using whole program view.
Normally, optimizations in compilers are done on a per-method basis: the compiler looks at what the method does and figures out a way to do that most efficiently using the information seen in the method body and input from the type system. A whole program optimization looks at the entire program, and feeds that information into compilation of individual methods on top of the existing data from the type system and the method body itself. This allows generating better code.
Whole program view depends on knowing what the entire program does at the time the method is compiled: before the compiler starts generating code for the first method, it needs to know what all the methods in the program do. AOT compilers are well positioned to have this knowledge. The constraints that AOT imposes (no ability to generate or load new code at runtime) play well with this.
Here are the top optimizations done by native AOT in .NET 8 using whole program view.
1. Sealing classes that are not inherited by any other class
using System.Runtime.CompilerServices;
CallToString(new Base());
// NoInlining so that codegen doesn't inline it into callsite
[MethodImpl(MethodImplOptions.NoInlining)]
static string CallToString(Base b) => b.ToString();
class Base
{
public override string ToString() => "Base";
}
class Derived : Base
{
public override string ToString() => "Derived";
}
In the above program, the CallToString
method gets compiled into following CPU instructions:
// Dereference `this` pointer
cmp byte ptr [rcx], cl
// Load address of a "frozen string literal", a System.String instance
// that is allocated in the data segment of the executable (outside GC heap).
lea rax, gword ptr [(reloc 0x422978)]
// Return
ret
Observations:
Base.ToString
was inlined intoCallToString
, despite this being a virtual call. The generated code performs a null check in the first instruction (to trigger aNullReferenceException
if the parameter was null), loads a string literal “Base” and returns.- This is possible because the whole program view realized that there’s no other class that could be typed as
Base
thanBase
itself. - The program has a
Derived
class in it that derives fromBase
and overridesToString
, but it got optimized away because it’s unused. - If loading new code was supported, this optimization would be illegal because at any point
Derived
could be loaded or a new class deriving fromBase
could be created withReflection.Emit
, orAssembly.LoadFrom
. We’d either need an extra if check to check the type is indeedBase
or we’d need a way to invalidate all methods that had wrong assumptions at the point when the assumption becomes invalid. - At some point in the future, we might also be able to optimize away the null check at the beginning of
CallToString
because the whole program view can see this method is never called with a null parameter (the whole program view also knows which methods are called indirectly through delegates or reflection – and this method isn’t).
2. Devirtualizing interface calls to interfaces with few implementors
using System.Runtime.CompilerServices;
CallFrob(new Fooer1(), 1, 1);
CallFrob(new Fooer2(), 1, 1);
[MethodImpl(MethodImplOptions.NoInlining)]
static int CallFrob(IFoo foo, int x, int y) => foo.Frob(x, y);
interface IFoo
{
int Frob(int x, int y);
}
class Fooer1 : IFoo
{
public int Frob(int x, int y) => x + y;
}
class Fooer2 : IFoo
{
public int Frob(int x, int y) => x - y;
}
In the above program, CallFrob
gets compiled into following instructions:
// Load MethodTable of Fooer1
lea rax, [(reloc 0x420508)]
// Compare type of first parameter with Fooer1
cmp qword ptr [rcx], rax
// If not equal, jump to IG04
jne SHORT G_M000_IG04
G_M000_IG03:
// Fancy way to add rdx and r8 and put result in eax
lea eax, [rdx+r8]
// Jump to end of method
jmp SHORT G_M000_IG05
G_M000_IG04:
// Move second parameter into eax
mov eax, edx
// Subtract third parameter from eax
sub eax, r8d
G_M000_IG05:
// Return
ret
Observations:
- The code generated for
CallFrob
is essentiallyfoo.GetType() == typeof(Fooer1) ? x + y : x – y
- The compilation process figured out there are only two possible types that could be
IFoo
at runtime. If it’s notFooer1
(or null, which would have thrown when retrieving type identity), it must beFooer2
. - Both implementations were small enough to inline so they got inlined.
- If loading new code was supported (JIT case), a fallback option would need to be generated, or we’d need a way to invalidate this code if a new implementation of
IFoo
is loaded.
3. Interpreting static constructors at compile time and inlining readonly information
using System.Runtime.CompilerServices;
class Primes
{
readonly static int ThePrime;
[MethodImpl(MethodImplOptions.NoInlining)]
static int Main()
{
return ThePrime;
}
static Primes()
{
const int Seq = 6;
// Computes the Seq-th prime number and stores it in ThePrime.
var candidates = new bool[(Seq + 1) * (Seq + 1)];
for (int i = 2; i < candidates.Length; i++)
{
for (int j = i * 2; j < candidates.Length; j += i)
candidates[j] = true;
}
int numPrime = 0;
for (int i = 2; i < candidates.Length; i++)
{
if (!candidates[i])
numPrime++;
if (numPrime == Seq)
{
ThePrime = i;
break;
}
}
}
}
The code generated for Main
is:
mov eax, 13
ret
Observations:
- The method body of Main simply returns a constant 13 (the sixth smallest prime).
- The static constructor got interpreted at compile time and when code generation saw the readonly bit on the field, it simply asked for the constant value to inline into code.
- The native code for the static constructor was not even generated.
- In .NET 9, this optimization gets extended to also work without
readonly
because whole program optimization can see this field is never assigned outside the static constructor (and there’s no reflection access to it either). This can be useful even when it’s not possible to mark somethingreadonly
in source code (because it’s modified), but the code doing the modification got optimized away and the field became effectively read only.
Summary
Being able to make “closed world” assumptions about the program opens the door to optimizations that are not possible in an open world. Native AOT first debuted in .NET 7 and there’s still a long road towards enabling more of these “closed world” optimizations.
Of course JIT-compiled .NET also has tricks up it’s sleeve, such as ability to monitor program behavior at runtime and recompile methods based on observed runtime behaviors (dynamic PGO). You’ll often see JIT-based .NET beat AOT-based .NET in peak throughput, however these closed world optimizations can sometimes give edge to the AOT-compiled one.