[C#] The power of IEnumerable and Yield
IEnumerable is an interface that represents a set of data that can be evaluated item by item, forward-only, similarly to a stream.
Native collections, such as array and List<T>, already implement the IEnumerable interface for its easiness of use.
IEnumerable instances are frequently consumed on a foreach statement.
IEnumerable<string> items = new string[] { "stub" };
foreach(string item in items)
Console.WriteLine(item.ToLower());
System.Linq offer some extension methods to facilitate manipulating an IEnumerable, such as Count() to calculate the amount of items, Select() to generate a new object for each item and Sum() to execute a summation of all the items in the set.
using System.Linq;
var items = new int[] { 1, 2, 3 };
var summation = items.Sum();
Under the hood, an IEnumerable is enumerated through the IEnumerator it generates, which can be consumed manually.
IEnumerable<string> items = new string[] { "stub" };
using(IEnumerator<string> enumerator = items.GetEnumerator())
while(enumerator.MoveNext())
Console.WriteLine(enumerator.Current.ToLower());
IEnumerable can also be the return type of generator functions, where the data is generated on the fly instead of loading all in memory.
On C#, a generator function can be written by using the yield keyword.
Such functions are particularly useful to process a data source without loading the entire data set into memory first, allowing the creation of algorithms that consume constant memory - O(1) - while evaluating the function.
The example below demonstrates one way to create a generator function that can read .csv files of any size, by returning values of line by line.
public static IEnumerable<string> EnumerateLinesOfFile()
{
using (var reader = new StreamReader("file.csv"))
while (!reader.EndOfStream)
yield return reader.ReadLine();
}
static void Main(string[] args)
{
var lineCount = EnumerateLinesOfFile().Count();
var message = $"The file has {lineCount} lines";
Console.WriteLine(message);
}
Another useful scenario is processing a SQL query with a big resulting data set, which can be iterated without loading everything into memory.
Note that the connection is kept open while the loop is running, and is only closed when the IEnumerator is disposed implicitly (at the end of the foreach loop) or explicitly (manually or by the using statement).
public IEnumerable<string> EnumerateFromDatabase()
{
using(var connection = new SqlConnection("<connection_string>"))
using(var command = connection.CreateCommand())
{
command.CommandText = "<select>";
connection.Open();
using (var reader = command.ExecuteReader())
while(reader.Read())
yield return reader[0].ToString();
}
}
Generator functions only begin to run when the IEnumerable starts to be evaluated, for example by using the foreach or an IEnumerator. Some Linq functions, such as Count(), FirstOrDefault() and ToList() also enumerate the function.
It is important to note that some Linq functions behave as generator functions and are not immediately evaluated when called, such as Select() and Where(), regardless if it was called on top of an on-the-fly IEnumerable or an in-memory collection.
Another less common option is to create infinitely evaluatable functions, which can be interrupted by using the break statement, or limiting Linq functions such as First() or Take().
/// <summary>Function that generate Fibonacci numbers</summary>
static IEnumerable<int> Fibonacci()
{
int value1 = 1;
int value2 = 1;
yield return value1;
yield return value2;
while (true)
{
int aux = value2;
value2 += value1;
value1 = aux;
yield return value2;
}
}
static void Main(string[] args)
{
// Take the 10th number of fibonacci = 55
var fibonacciTenth = Fibonacci().Skip(9).First();
// Sum the 5 first numbers = 12
var sumFirstFive = Fibonacci().Take(5).Sum();
// Throws exception, because the function evaluation is infinite
var allFibonacciNumbers = Fibonacci().ToList();
}
C# 8.0 also offers an IAsyncEnumerable interface for generator functions with asynchronous capabilities. This link leads to an article about asynchrony.
Linq methods can be used on IAsyncEnumerable by adding the officially supported package System.Linq.Async.
// Generator function with async
public async IAsyncEnumerable<string> EnumerateLinesAsync()
{
using(var reader = new StreamReader("path.csv"))
while(!reader.EndOfStream)
yield return await reader.ReadLineAsync();
}
// Consuming an IAsyncEnumerable
await foreach(var line in EnumerateLinesAsync())
{
Console.WriteLine(line);
}
Note that the IAsyncEnumerable interface is only available on .NET Standard 2.1, .NET Core 3.0 and newer versions, and is not available on .NET Framework at the moment (currently 4.8).
Using the IAsyncEnumerable is the recommended way to write asynchronous generator functions. It is not trivial to keep the asynchronicity while wrapping around a basic IEnumerable, since IEnumerator executes MoveNext() as a blocking method.