[C#] Prefetching async methods for performance
Prefetching is a technique that starts loading data before it is needed, reducing total runtime at the risk of loading unnecessary data.
In coding, since there is more knowledge over what is needed to execute a method, it is easier to control when certain data should be loaded or not.
On C#, using async and tasks allow fetching data without interrupting the flow of code. This way, by prefetching data, the algorithm can dilute the downtime of I/O operations by working on other operations while waiting for the I/O result.
Below are some examples of prefetching with asynchony:
Prefetching one result
Calling an async method will immediately start its execution without interrupting the code flow. By holding the reference of an async task, it is possible to await the data only when it is actually needed.
public static async Task<string> GetHtmlAsync(string uri)
{
using (var client = new HttpClient())
return await client.GetStringAsync(uri);
}
// Start prefetching
var taskHtml = GetHtmlAsync("https://domain.com");
CodeWithoutHtml();
// Await and consume the result
var html = await taskHtml
CodeWithHtml(html);
Prefetching multiple results
Calling async methods in sequence without awaiting them is enough to fetch their data in parallel, without interrupting code flow, until an await is executed.
// Start pre-fetching
var taskHtml1 = GetHtmlAsync("https://domain1.com");
var taskHtml2 = GetHtmlAsync("https://domain2.com");
var taskHtml3 = GetHtmlAsync("https://domain3.com");
// Await results
var html1 = await taskHtml1;
var html2 = await taskHtml2;
var html3 = await taskHtml3;
Chaining prefetching
There are situations where an async method requires the result of another async method, creating a chain of fetches.
Tasks can be chained through the use of ContinueWith(), which invokes a method that starts executing as soon as a task is completed, and Unwrap(), which will expose the chained task being executed inside ContinueWith. Note that, since ContinueWith will only start when the task is completed, calling .Result property will not block the thread.
// Start pre-fetching
var taskUrl = RetrieveUrlAsync();
var taskStatusCode = taskUrl.ContinueWith(async (task) =>
{
return await GetStatusCodeAsync(task.Result);
}).Unwrap();
var taskFavicon = taskUrl.ContinueWith(async (task) =>
{
return await HasFaviconAsync(task.Result);
}).Unwrap();
// Await results
var statusCode = await taskStatusCode;
var hasFavicon = await taskFavicon;
Prefetching with IAsyncEnumerable
When using IAsyncEnumerable, it is also possible to prefetch the next result by manipulating the IEnumerator, as demonstrated by the following extension method. Benchmarks generated by BenchmarkDotNet.
public static async IAsyncEnumerable<T> WithPrefetch<T>(this IAsyncEnumerable<T> enumerable)
{
await using(var enumerator = enumerable.GetAsyncEnumerator())
{
ValueTask<bool> hasNextTask = enumerator.MoveNextAsync();
while(await hasNextTask)
{
T data = enumerator.Current;
hasNextTask = enumerator.MoveNextAsync();
yield return data;
}
}
}
// Prefetching 1 item
await foreach(var item in EnumerateAsync().WithPrefetch())
Process(item);
There is a significant performance improvement prefetching data when fetch time and processing time are close. Attempting to use prefetch on data already in memory overloads the system with no gains.
Fetch | Process | Execution time (no-prefetch) | Execution time (prefetch) | Improvement |
0 ms | 0 ms | 0.0006 ms | 0.0013 ms | -116% |
20 ms | 20 ms | 964 ms | 497 ms | 93% |
100 ms | 20 ms | 2117 ms | 1675 ms | 26% |
20 ms | 100 ms | 2118 ms | 1674 ms | 26% |
200 ms | 20 ms | 3540 ms | 3099 ms | 14% |
20 ms | 200 ms | 3572 ms | 3093 ms | 15% |
There are no benefits prefeching more items when only a single item is processed at a time. Below is a code to prefetch more a single item:
public static IAsyncEnumerable<T> WithPrefetch<T>(this IAsyncEnumerable<T> enumerable, int prefetchDepth)
{
while(prefetchDepth > 0)
{
enumerable = enumerable.WithPrefetch();
prefetchDepth--;
}
return enumerable;
}
// Prefetching 10 items
await foreach(var item in EnumerateAsync().WithPrefetch(10))
Process(item);
Benchmarks demonstrate that prefetching more than one data causes a linear increase on processing time.
The method used as a reference sums 100 numbers from an enumerator when data is loaded synchronously, asynchronously or asynchronously with prefetch. 1 ms = 1,000,000 ns
Method | Mean | Error | StdDev | Gen0 | Allocated |
Sync | 132.4 ns | 0.53 ns | 0.44 ns | - | - |
AsyncWithoutPrefetch | 5,062.2 ns | 49.27 ns | 46.09 ns | 0.0381 | 168 B |
AsyncWithPrefetch_01_Record | 8,872.0 ns | 175.48 ns | 195.05 ns | 0.0763 | 344 B |
AsyncWithPrefetch_02_Records | 15,175.8 ns | 260.77 ns | 243.93 ns | 0.1221 | 520 B |
AsyncWithPrefetch_04_Records | 21,440.6 ns | 143.50 ns | 127.21 ns | 0.1831 | 872 B |
AsyncWithPrefetch_08_Records | 37,753.9 ns | 251.59 ns | 196.43 ns | 0.3662 | 1576 B |
AsyncWithPrefetch_16_Records | 78,780.6 ns | 944.58 ns | 837.35 ns | 0.6104 | 2984 B |
AsyncWithPrefetch_32_Records | 159,310.1 ns | 2,197.14 ns | 2,055.21 ns | 1.2207 | 5800 B |
Changing only processing time or fetching time without the other does not impact on the execution time of the function, regardless of the configured depth of prefetched data.
Method | Mean | Error | StdDev | Gen0 | Allocated |
Fetch0ms_Process20ms_Prefetch0 | 468.0 ms | 9.0 ms | 7.5ms | - | 16608 B |
Fetch0ms_Process20ms_Prefetch1 | 466.0 ms | 2.2 ms | 2.0ms | - | 16784 B |
Fetch0ms_Process20ms_Prefetch2 | 470.8 ms | 7.9 ms | 7.0ms | - | 10992 B |
Fetch0ms_Process20ms_Prefetch10 | 465.8 ms | 4.2 ms | 3.3ms | - | 18376 B |
Fetch20ms_Process0ms_Prefetch0 | 469.9 ms | 5.8 ms | 5.4 ms | - | 17072 B |
Fetch20ms_Process0ms_Prefetch1 | 466.4 ms | 3.2 ms | 2.8 ms | - | 17664 B |
Fetch20ms_Process0ms_Prefetch2 | 466.3 ms | 3.0 ms | 2.5 ms | - | 17976 B |
Fetch20ms_Process0ms_Prefetch10 | 466.0 ms | 2.6 ms | 2.3 ms | - | 20840 B |
Increasing the prefetching depth also does not improve performance even when processing time and fetching time favor parallelism.
Method | Mean | Improvement to non-prefetch |
Fetch100ms_Process20ms_Prefetch0 | 2.117 s | N/A |
Fetch100ms_Process20ms_Prefetch1 | 1.675 s | 26.39% |
Fetch100ms_Process20ms_Prefetch2 | 1.681 s | 25.94% |
Fetch100ms_Process20ms_Prefetch10 | 1.675 s | 26.39% |
Method | Mean | Improvement to non-prefetch |
Fetch200ms_Process20ms_Prefetch0 | 3.540 s | N/A |
Fetch200ms_Process20ms_Prefetch1 | 3.099 s | 14.23% |
Fetch200ms_Process20ms_Prefetch2 | 3.096 s | 15.35% |
Fetch200ms_Process20ms_Prefetch10 | 3.099 s | 14.23% |
Method | Mean | Improvement to non-prefetch |
Fetch20ms_Process100ms_Prefetch0 | 2.118 s | N/A |
Fetch20ms_Process100ms_Prefetch1 | 1.674 s | 26.52% |
Fetch20ms_Process100ms_Prefetch2 | 1.680 s | 26.07% |
Fetch20ms_Process100ms_Prefetch10 | 1.675 s | 26.45% |
Method | Mean | Improvement to non-prefetch |
Fetch20ms_Process200ms_Prefetch0 | 3.527 s | N/A |
Fetch20ms_Process200ms_Prefetch1 | 3.093 s | 14.03% |
Fetch20ms_Process200ms_Prefetch2 | 3.096 s | 13.92% |
Fetch20ms_Process200ms_Prefetch10 | 3.091 s | 14.11% |