Scalability tips
In computing, scalability is the capability of a system to handle an increasing workload gracefully. In other words, how well the system behaves when there are more users, more requests or more data, and how easy and cheap it is to increase the system throughput.
The concept relates to performance, but the definition can differ. Performance usually refers to how much time an operation takes to complete, while scalability is more focused on how many operations the system can handle concurrently without breaking. Note that some test scenarios will include scalability as one indicator of overall performance.
Systems with bad scalability have an established limit of how much workload it can handle. This can hinder the growth of the company, because increasing the amount of users or data may break the software.
Below are a few insights on how to increase and maintain the scalability of a system:
Enable horizontal scaling whenever possible
Consider using databases that automatically scale over those that do not, such as MongoDB over SQL. Non-scalable databases usually can only be scaled vertically, which becomes exponentially more expensive. Splitting the database per tenant or into different microsservices can also be valid strategies.
Consider deploying your web application under mechanisms that allows additional instances easily, with load balancing, such as deploying over kubernetes or a corresponding cloud service. When the demand becomes too high, it becomes easier to add more instances instead of over-optimizing code.
Use your resources wisely
If there is any resource that cannot scale horizontally, keep its workload to a minimum. Direct consumption of that resource will eventually become a bottleneck.
Do not overuse memory or processing power, specially on non-scalable resources. Trying to improve performance by consuming more memory or processing power can hurt scalability in the long run, and should be done with caution.
Optimize your external resources. For example, when using a SQL database, make proper use of indexes, avoid huge data read/write operations or load too much data in memory through temporary tables.
Use asynchronous programming to make better use of threads, for example by using the C# async and await. This link leads to an article about asynchrony.
Stream big collections instead of loading all in memory, for example by using the C# yield. This link leads to an article about yielding and IEnumerable.
Use infrastructure in your favor
It is good to cache data or results that changes infrequently. They are relatively easy to manage, and cached results not only avoid consuming resources from the infrastructure to generate the same data everytime, but also speed up processing time. This is specially useful when caching results of non-scalable resources.
Apply request coalescing in non-mutable and frequently called functions. Request coalescing means that different function calls that predictably return the same result do not need to be executed twice - attach the second request to the first and return the same data to both. I created a thread-safe C# library that allows both request coalescing and in-memory cache, called TimedDictionary.
Execute only what is necessary, and avoid using generic functions that generate and return much more data than is required to fulfill the request. Try to keep a balance between practicality and optimization.