C# Native AOT performance
How fast are .NET Native AOT applications comparing to the regular managed code? Can AOT outperform JIT? How to benchmark Native AOT applications?
This article is part of a series about Native AOT in .NET. If you are not familiar with Native AOT, read the How to develop Native AOT applications in .NET part first.
This article compares .NET and Native AOT performance. First, we will review the official Microsoft benchmarks. They allow to compare different .NET deployment options for simple ASP.NET applications.
Then, you will learn how to run own benchmarks using BenchmarkDotNet and hyperfine tools. Such benchmarks allow you to measure code speed in your environment.
ASP.NET benchmarks
ASP.NET team maintains a solid infrastructure for performance testing. They test various scenarios in different environments.
We are most interested in the Native AOT benchmarks. The primary source of information is the following PowerBI dashboard. The data there is based on 3 "whales": test applications, deployment scenarios, and metrics.
Test applications
You can find the source code of benchmarks and test applications in the aspnet/Benchmarks repository.
Native AOT benchmarks compare 3 application types:
- Stage1 - a minimal API based on HTTP and JSON. The application source code is located in
/src/BenchmarksApps/BasicMinimalApi
. - Stage1Grpc - a similar API based on gRPC (
/src/BenchmarksApps/Grpc/BasicGrpc
) - Stage2 - full web app involving database, authentication (
/src/BenchmarksApps/TodosApi
)
.NET Deployment scenarios
Test applications are run in different environments. At the moment, benchmarks use Windows and Linux virtual machines with 28 cores. There are also separate Linux environments for ARM and Intel processors.
Applications are also tested in different configurations. An application in some configuration defines a "scenario".
You can hold down the Ctrl (or ⌘) key to select multiple scenarios or environments on the PowerBI dashboard.
Metrics
Benchmarks collect fundamental metrics for every deployed application. For example, tests measure request count per second (RPS), startup time, max memory working set.
That allows us to compare metric values for various configurations of the same application.
Performance comparison
We will compare StageX scenarios with StageXAot and StageXAotSpeedOpt. They use the following configuration:
Scenario | dotnet publish build arguments |
---|---|
StageX | PublishAot=false EnableRequestDelegateGenerator=false |
Stage2Aot | PublishAot=true StripSymbols=true |
Stage2AotSpeedOpt | PublishAot=true StripSymbols=true OptimizationPreference=Speed |
All scenarios above also use the DOTNET_GCDynamicAdaptationMode=1
environment variable.
StageXAotSpeedOpt scenarios allow to estimate the impact of the OptimizationPreference = Speed setting.
You may review StageXTrimR2RSingleFile scenarios too. Such scenarios correspond to trimmed ReadyToRun deployment, which is another form of ahead-of-time compilation in .NET. Sometimes, it is a good alternative to Native AOT.
Here are the current performance comparison results for .NET 9 Release Candidate (September 2024):
Startup time
AOT applications start much faster than managed versions. That's true for both Stage1 and Stage2 applications and for all environments. Sample results:
Scenario | Startup time (ms) |
---|---|
Stage2AotSpeedOpt | 100 |
Stage2Aot | 109 |
Stage2 | 528 |
Working set
The max working set for Native AOT applications is less than for managed versions. On Linux, managed versions use about 1.5 - 2 times more RAM than AOT versions. For example:
Scenario | Max working set (MB) |
---|---|
Stage1Aot | 56 |
Stage1AotSpeedOpt | 57 |
Stage1 | 126 |
On Windows, the difference is smaller. Especially, for Stage2:
Scenario | Max working set (MB) |
---|---|
Stage2Aot | 152 |
Stage2AotSpeedOpt | 150 |
Stage2 | 167 |
Requests per second
Larger RPS values mean faster application. The lightweight Stage1 application usually handles about 800-900K requests per second. The larger Stage2 application only handles about 200K requests.
For the Stage2 application, the .NET version handles more requests than AOT versions in all environments. The speed of the Stage2AotSpeedOpt version is sometimes close. But, usually it lies between Stage2 and Stage2Aot. Here are the typical results:
Scenario | RPS |
---|---|
Stage2 | 235,008 |
Stage2AotSpeedOpt | 215,637 |
Stage2Aot | 194,264 |
The results for the Stage1 application are similar on Intel Linux and Intel Windows. However, on Ampere Linux, AOT beats the managed version. Sample results from Ampere Linux:
Scenario | RPS |
---|---|
Stage1AotSpeedOpt | 929,524 |
Stage1Aot | 912,344 |
Stage1 | 844,659 |
So, environment and the application code may significantly affect speed. It makes sense running own benchmarks to estimate the Native AOT benefits for your project. Let's write custom benchmarks without Microsoft testing infrastructure.
Benchmarking Native AOT applications
We will use 2 types of benchmarks. The first one is based on BenchmarkDotNet - the popular library for benchmarking .NET code. These benchmarks compare pure speed, excluding startup time.
The second one is based on the hyperfine tool. It allows to compare execution time of two shell commands. These benchmarks compare overall speed, including startup time.
We will not compare memory consumption here. At the moment, the NativeMemoryProfiler
diagnoser in
BenchmarkDotNet does not support Native AOT runtime. hyperfine does not currently track memory usage
too.
You can download the source code from the NativeAotBenchmarks repository on GitHub. We encourage you to try them in your environment. This article describes results from a Windows 11 laptop with Intel Core i9-13900H processor and 16 Gb RAM.
Make sure you run benchmarks properly. Here are the common recommendations:
- Use the Release build.
- Turn off all the applications except the benchmark process. For example, disable antivirus software, close Visual Studio and a web browser.
- Keep your laptop plugged in and use the best performance mode.
- Use the same input data in the scenarios being compared.
Test cases
We will benchmark 2 scenarios in .NET 8:
1. Simple C# code for a string compression using the counts of repeated characters. For example, the string "aabcccccaaa" would become "a2b1c5a3":
string Compress(string s)
{
StringBuilder compressed = new(s.Length);
for (int i = 0; i < s.Length; ++i)
{
char c = s[i];
for (int j = i + 1; j <= s.Length; ++j)
{
if (j == s.Length || s[j] != c)
{
compressed.Append(c + $"{j - i}");
i = j - 1;
if (compressed.Length > s.Length)
return s;
break;
}
}
}
if (compressed.Length <= s.Length)
return compressed.ToString();
return s;
}
2. A heavier PDF to PNG conversion task that uses Docotic.Pdf.
Prerequisites
Install prerequisites for .NET Native AOT deployment.
Install hyperfine to run corresponding benchmarks.
For PDF to PNG benchmarks, get a free time-limited license key on the Download C# .NET PDF library page. You
need to apply the license key in the Helper.cs
.
BenchmarkDotNet
These benchmarks are located in the NativeAotBenchmarks
project. We compare results for
RuntimeMoniker.NativeAot80 and RuntimeMoniker.Net80. By default, BenchmarkDotNet builds Native AOT
code with the OptimizationPreference=Speed
setting.
BenchmarkDotNet performs 6 or more warmup iterations. That helps JIT to pre-compile code and collect some statistics. Thus, such benhmarks exclude startup time from comparison.
String compression
The CompressString
benchmark for string compression uses a long string with duplicate characters.
The common mistake would be to generate a random string. In such a case, benchmarks for Native AOT
and .NET 8 would use different input strings. It is possible to use random strings, but you need to
initialize a random generator with the same seed.
The Native AOT version runs about 1.08 times faster than the .NET 8 version:
Method | Runtime | Mean | Error | StdDev |
---|---|---|---|---|
Compress | .NET 8.0 | 4.117 ms | 0.0553 ms | 0.0517 ms |
Compress | NativeAOT 8.0 | 3.809 ms | 0.0403 ms | 0.0377 ms |
PDF to PNG
PDF to PNG benchmarks process PDF documents in memory. That allows to exclude the interaction with the file system. I/O operations with a disk can skew benchmark results.
We test speed with two PDF documents. The first one, Banner Edulink One.pdf, is more complex. It is converted to a 72 dpi PNG and requires more time for processing. The .NET 8 version is a slightly faster for this document:
Method | Runtime | Mean | Error | StdDev |
---|---|---|---|---|
Convert | .NET 8.0 | 1.103 s | 0.0156 s | 0.0146 s |
Convert | NativeAOT 8.0 | 1.167 s | 0.0160 s | 0.0149 s |
The second document is smaller and simpler. It is converted to a 300 dpi PNG. And speed is almost equal:
Method | Runtime | Mean | Error | StdDev |
---|---|---|---|---|
Convert | .NET 8.0 | 290.1 ms | 5.78 ms | 6.88 ms |
Convert | NativeAOT 8.0 | 288.3 ms | 4.44 ms | 3.94 ms |
hyperfine
These benchmarks are located in the NativeAotTestApp
project. The project does not use the
OptimizationPreference=Speed
setting. You can enable it in the NativeAotTestApp.csproj:
<OptimizationPreference>Speed</OptimizationPreference>
Use the benchmark.bat script to run tests on Windows. You can convert it to Bash for
Unix/Linux-based operating systems. The script builds .NET 8 and Native AOT versions of the same
app. Then, it compares their performance with similar commands:
hyperfine --warmup 3 "net8-app.exe" "native-aot-app.exe"
Warmup runs in hyperfine help to start test applications on "warm" disk caches. Unlike BenchmarkDotNet, the hyperfine warmup does not help JIT. Therefore, hyperfine benchmarks compare total application speed, including startup time.
Our test application supports the iteration count argument. It allows to repeat the same code multiple times in a simple loop:
for (int i = 0; i < iterationCount; ++i)
CompressString(args);
The idea is to decrease the impact of the startup time difference. Repeating the same code gives JIT chances to collect more runtime statistics and generate faster code.
A common situation is the following. First time, you run benchmarks with a single iteration. A Native AOT version works much faster. Then, you run the same benchmarks with multiple iterations and the total speed of both versions becomes equal. It means that after startup, a managed version is actually faster.
String compression
For 100,000 iterations of the same input string compression, the Native AOT performance is better:
Benchmark 1: .NET 8 version (100000 iterations)
Time (mean ± σ): 151.5 ms ± 2.6 ms [User: 32.1 ms, System: 1.6 ms]
Range (min … max): 148.0 ms … 157.5 ms 19 runs
Benchmark 2: Native AOT version (100000 iterations)
Time (mean ± σ): 55.1 ms ± 3.1 ms [User: 15.0 ms, System: 2.1 ms]
Range (min … max): 51.6 ms … 65.9 ms 51 runs
Summary
Native AOT version ran 2.75 ± 0.16 times faster than .NET 8 version
But the speed becomes almost the same for 10,000,000 iterations:
Benchmark 1: .NET 8 version (10000000 iterations)
Time (mean ± σ): 3.984 s ± 0.139 s [User: 2.946 s, System: 0.009 s]
Range (min … max): 3.790 s … 4.182 s 10 runs
Benchmark 2: Native AOT version (10000000 iterations)
Time (mean ± σ): 3.956 s ± 0.041 s [User: 2.848 s, System: 0.004 s]
Range (min … max): 3.888 s … 4.016 s 10 runs
Summary
Native AOT version ran 1.01 ± 0.04 times faster than .NET 8 version
PDF to PNG
For a single iteration of Banner Edulink One.pdf to PNG conversion, the AOT version runs about 1.88 times faster than the .NET 8 version:
Benchmark 1: .NET 8 version (1 iteration)
Time (mean ± σ): 2.417 s ± 0.104 s [User: 1.334 s, System: 0.116 s]
Range (min … max): 2.295 s … 2.629 s 10 runs
Benchmark 2: Native AOT version (1 iteration)
Time (mean ± σ): 1.288 s ± 0.011 s [User: 0.573 s, System: 0.123 s]
Range (min … max): 1.274 s … 1.310 s 10 runs
For 20 iterations, the speed difference is negligible:
Benchmark 1: .NET 8 version (20 iterations)
Time (mean ± σ): 25.048 s ± 0.223 s [User: 13.278 s, System: 2.312 s]
Range (min … max): 24.751 s … 25.423 s 10 runs
Benchmark 2: Native AOT version (20 iterations)
Time (mean ± σ): 25.213 s ± 0.114 s [User: 12.661 s, System: 2.275 s]
Range (min … max): 25.042 s … 25.350 s 10 runs
Summary
.NET 8 version ran 1.01 ± 0.01 times faster than Native AOT version
For 3BigPreview.pdf, the Native AOT version is faster even with 100 iterations:
Benchmark 1: .NET 8 version (100 iterations)
Time (mean ± σ): 10.009 s ± 0.152 s [User: 5.298 s, System: 0.567 s]
Range (min … max): 9.677 s … 10.189 s 10 runs
Benchmark 2: Native AOT version (100 iterations)
Time (mean ± σ): 8.336 s ± 0.070 s [User: 3.405 s, System: 0.505 s]
Range (min … max): 8.247 s … 8.459 s 10 runs
Summary
Native AOT version ran 1.20 ± 0.02 times faster than .NET 8 version
Conclusion
Native AOT applications start faster comparing to regular .NET. The official benchmarks also show that AOT applications have smaller memory footprints.
But after startup, managed applications usually show better speed. That happens because JIT has access to runtime information. In long running applications, it can regenerate more effective code based on dynamic profile-guided optimization and other techniques.
ASP.NET benchmarks allow you to compare different configurations from the performance perspective. However, results depend on an operating system and a processor architecture. You need to run own benchmarks in your target environment to find the optimal deployment configuration.