[Concept] Commitment ordering

01/21/2023

It's common for object-oriented languages to handle database records by creating classes and loading objects representing the database structure. Eric Evans on his domain-driven design book classifies such classes as Entity classes.

This way, business rules can be executed against those objects to verify and change their values, so later those changed objects can be persisted to the database with consistency.

Concurrency scenarios

On environments with parallel executions, such as web APIs, a single entity can be requested by multiple threads at the same time, where each thread changes the object data according to the business rules being handled.

However, trying to persist all the different object instances to a single database record can generate conflicts and data might be overridden, breaking the consistency.

Inconsistency example

The code below demonstrates a scenario where multiple threads try to increment a value on the same entity 10000 times in parallel.

var userId = 1;

// Setup
var userRepository = new UserRepository();
var parallelOptions = new ParallelOptions() 
{ 
    MaxDegreeOfParallelism = 10 
};

// Execution
Parallel.For(0, 10_000, parallelOptions, (attempt) => 
{
    var user = userRepository.GetByID(userId);
    user!.IncrementalValue++;

    userRepository.Update(user);
});

// UserRepository Class
public void Update(User user)
{
    using(var conn = new MySqlConnection(ConnectionString))
    using(var command = conn.CreateCommand())
    {
        command.CommandText = "UPDATE User SET IncrementalValue = @IncrementalValue WHERE ID = @ID;";
        command.Parameters.AddWithValue("@ID", user.ID);
        command.Parameters.AddWithValue("@IncrementalValue", user.IncrementalValue);

        conn.Open();
        command.ExecuteNonQuery();
    }
}

Since each execution incremented the value by 1, the expectation is that after 10000 runs the IncrementalValue should end up with 10000. But since the UPDATE function just blindly overrides the data on a high-concurrency scenario, in practice the value is likely much less, sometimes being less than 2000 on some test runs.

On a production environment, this could represent an inconsistency of more than 80% of the operations.

Making it consistent

Those scenarios require the implementation of a concurrency control, so one thread doesn't override the results of the others.

A popular way to avoid concurrency is by using locks, so one thread cannot use a resource or access a code snippet while another process is locking it.

However, locking does have a few drawbacks, such as requiring a deadlock management, proper granularity to avoid excess contention, and fault-tolerance on distributed systems scenarios.

Besides lock, there are many other concurrency control methods and implementations, each better suited for different scenarios.

Commitment ordering

Commitment ordering is an optimistic concurrency control method without locks.

Under this method, the object data can only be persisted if the database still contains the same data that created the original object. One way to check this is by versioning the record, where any record update also changes the record version.

Version mismatches can be handled based on the use case. For example, if the function is triggered by users, it's possible to throw an error to them informing that the source data has changed and refresh the data displayed, so they can analyze if the function should still be executed over the new set of data.

This is a fairly easy concurrency control solution for distributed or containerized systems, where the use of external locks can become tricky.

Implementation

Some databases and libraries already offer support for commitment ordering. For example, the SQL Server database offer a column type rowversion, and the ORM library Entity Framework has concurrency tokens. However, the logic is not hard to implement on most SQL databases.

The example below demonstrates a simplified implementation, handling mismatches by simply restarting the process from scratch and reloading all the entities with the most up-to-date data.

var userId = 1;

// Setup
var userRepository = new UserRepository();
var parallelOptions = new ParallelOptions() 
{ 
    MaxDegreeOfParallelism = 10 
};

// Execution
Parallel.For(0, 10_000, parallelOptions, (attempt) => 
{
    var successful = false;

    do
    {
        var user = repo.GetByID(id);
        user!.IncrementalValue++;

        successful = repo.TryUpdateWithToken(user);
    } while(!successful);
});

// UserRepository Class
public bool TryUpdateWithToken(User user)
{
    var newVersion = Guid.NewGuid();

    using(var conn = new MySqlConnection(ConnectionString))
    using(var command = conn.CreateCommand())
    {
        command.CommandText = "UPDATE User SET total = @total, version = @newVersion WHERE id = @id AND version = @oldVersion;";
        command.Parameters.AddWithValue("@id", user.ID);
        command.Parameters.AddWithValue("@total", user.IncrementalValue);
        command.Parameters.AddWithValue("@oldVersion", user.Version);
        command.Parameters.AddWithValue("@newVersion", newVersion);

        conn.Open();
        var changedRowCount = command.ExecuteNonQuery();
        var hasChanged = changedRowCount > 0;

        if(hasChanged)
            user.Version = newVersion;

        return hasChanged;
    }
}

After the execution, the IncrementalValue correctly ends up with 10000. However, this handling strategy is best suited for situations where only a small amount of threads try update the same entity in parallel, otherwise the continuous retries can become computationaly expensive.

For this demonstration alone, each iteration triggered 9 version mismatches on average before finally being able to execute a successful update, so the UPDATE function was executed around 90000 times.