Building Blocks of a Scalable Architecture

Based on this article

Introduction

Modern online application development requirements are driven by the need for a highly-scalable and performance-centric platform. Like System Functionality Requirements, the NFR's (Non Functional Requiments) like scalability, performance and availability are given equal importance. For many years IT industry has been struggling to build highly scalable systems. In this struggle it has learned many good architecture and design principles.

This document captures some of these learning which are most frequently used from a very high level view. These learning has been categorized into design principles, design rules, design patterns, design antipatterns and building blocks of highly scalable online platforms. Followings are the high level definitions of these.

Design Principles are the fundamental design laws to be followed to build scalable systems. Design rules, patterns and anti-patterns are derived from one or more of these principles.
Design Rules are second level design laws that tells you of what to do and what not to do based on the past learning or what worked and what did not work.
Design Patterns are general reusable solutions that have been discovered in the past for building scalable systems.
Design Anti Patterns: are common design solutions which are proved to be ineffective for building scalable systems.
Building blocks: are commonly used infrastructure software, tools, frameworks, and services that can be used to build a scalable system.

Principles

Scalability principles are basic proposition, behavior, and properties of scalable systems. All scalability design patterns, rules, and anti-patterns are derived from these principles. If understood and used rationally we can design scalable systems without learning lot of intricacies and details of scalable systems. Before choosing any architectural and design option, consider these principles.

Simplicity

of design not only simplifies the scalability but also simplifies development, deployment, maintenance and support.

Decomposition

Decompose the system into smaller manageable subsystems. Each subsystem can carry out independent function. The subsystems should be able to independently run in a separate process or threads and enabled to scale using various load balancing and other form or tuning techniques.

Asynchronous

Asynchronous processing enables process execution without blocking on resources. Asynchronous prcessing comes with overhead as it is relatively complex to design and test.

Loose Coupling High Cohesion

Reducing coupling and increasing cohesion are two key principles to increase application scalability. Coupling is a degree of dependency at design or run time that exists between subsystems. Coupling can limit scalability due to server and resource affinity. In addition, loose coupling provides greater flexibility to independently choose optimized strategies for performance and scalability for different subsystems.

Weak cohesion among subsystems tends to result in more round trips because the classes or components are not logically grouped and may reside in different tiers. This can force you to require a mix of local and remote calls to complete a logical operation. So the sequence of interactions between subsystems becomes complex and chatty which reduces the scalability. therefore, each subsystem should be designed to work independently with minimum dependencies with other subsystems.

Concurrency and Parallelization

Concurrency is when multiple tasks performed simultaneously with shared resources. Parallelization is when single task divided into multiple simple independent tasks which can be performed simultaneously.

Parsimony

Parsimony means that an architect and developer must be economical towards the system resources in their design and implementations. They should try to use system resources (CPU, disk, memory, network, database connection etc) as effectively and efficiently as possible. It also means that scarce resources must be used carefully. Such resources might be cached or pooled and multiplexed. This principle pervades all the other things. No matter how well a system is architeched and designed, if system resources are not used carefully, application scalability and performance suffers.

Decentralization/Distribution

By definition a distributed system is a collection of subsystems running on independent servers that appears to its users as a single coherent system. Distributed systems offers high scalability and high availability by adding more servers.

Rules

Here are some common design rules derived from design principles:

Ensure your design works if scale changes by 10 times or 20 times;
Do not use bleeding edge technologies;
Optimize the design for the most frequent or important tasks;
Design for horizontal scalability;
Design to use commodity systems;
Design to Leverage the Cloud;
Use caches wherever possible;
Design scale into your solution;
Simplify the solution;
Performing I/O, whether disk or network, is typically the most expensive operation in a system;
Transactions use costly resources;
Use back of the envelope calculations to choose best design.

Antipatterns

Description	Problem	Solution
Excessive processing	Consumes resources which can be used by other transactions. Increases response time Decreases throughput	Remove, postpone (asynchronous), prioritize, or reorder the processing step Leverage caching to reuse loaded or calculated data
Presentation of large set of data to users	Typically OLTP users do not consume large amount of data so it is wastage of processing resources. Severe system performance issue can arise depending on the volume of data	Paginate results Narrow down search criteria
Not releasing resource immediately after use	Cause of connection leaks, deadlocks, performance degradation and other unexpected behavior Threads, sockets, database connection, file handler, and other resources can be a victim	Use finally block to ensure release Release resource as quickly as possible
Inefficient UI design	Additional work for the system Unnecessary large data presentation to user More number of pages presentation	Refractor UI
Unnecessary data retrieval	Unnecessary database, disk and network resources utilization	Select only fields and rows from database which are required.
Incompatibility with HA deployment	Application should be designed to support high availability and multi instance deployment.
Too many objects in memory	Unnecessary memory utilization Long garbage collection pause	Use singleton design pattern Create stateless services Reuse data objects Set cache expire time
No data caching	Unnecessary remote calls Unnecessary processing to calculate or transform data Increase database load Slow performance and scalability	Cache most frequently and read mostly data. Cache complex objects graphs to avoid processing.
Unlimited Resource usages
Usage of outdated tools/technology/API
Poor database schema design.	Expensive SQL that do not scale.	Do not over normalize database. Create summary tables for reports Use NoSQL database if required.
Poor transaction design.	Can cause locking and serialization problems.	Avoid transactions as much as possible. Avoid distributed transaction Avoid long lived transactions
Scaling through 3rd parties software	If you are relying on a vendor for your ability to scale such as with a database cluster you are asking for problems.	Use clustering and other vendor features for availability, plan on scaling by dividing your users onto separate devices, sharding.
Monolithic databases	If your data get big enough you will need the ability to split your database.
Underestimated network latency
Excessive layering	Each layer creates many temporary objects e.g. DTOs, consumes processing for data transformation and consumes network bandwidth if layers are spread across servers.	Simplify design
Excessive network round tripping, Chatty Services	High network usage Slow response	Design coarse grain services Design stateless services Try to query data from database with minimum number of interactions. Avoid fetching unnecessary data from database Cache data or service responses wherever possible
Overstuffed Session	Will use large memory even for inactive users till the session is destroyed. Application server will be able to handle less number of concurrent users. If we are using application server clustering there would be lot of network overhead.	Design application as stateless as possible. Use cache, Cookies, hidden fields, URL query parameters etc.

Design Patterns


Shared nothing architecture	Shared nothing architecture (SNA) is horizontal scalability architecture. Each node in SNA has its own memory, disks and input/output devices. Each node is self sufficient and shares nothing across the network. This reduces the any kind of contention among nodes as there is no scope for data or any other kind of resource sharing. This type of architecture is highly scalable for web applications. SNA partition its different layers (Web server, App Server, DB) to handle the incoming user requests based on many different policies such as geographic area, type of users etc. Google has implemented this which has enabled it to scale its web applications effectively by simply adding nodes.
Database sharding	Database sharding is a shared nothing horizontal database partitioning design pattern. Each database shard can be placed on separate machine or multiple shards can reside on single machine. This distributes data on multiple machines which means that database load is spread out on multiple machines which greatly improves the performance and scalability. Some of the advantages of sharding are Massive scalability, High availability, Faster queries; More write bandwidth, reduced cost as databases can run on commodity servers.
Master slave database	If your application is read heavy and does not require horizontal write scalability you can use master slave database replication. Many popular database provides this feature out of the box e.g. MySQL, Postgres etc.
Optimistic concurrency Optimistic locking
Eventual Consistency
Pooling and multiplexing	Pooling is an effective way to use expensive resources for example, large object graphs, database connections, threads.
Near real-time synchronization of data	A delay of few seconds and more should be acceptable for most of the integration systems so convert real time synchronous distributed transactions into near real time asynchronous one.
Just-enough data distribution	Distribute out as little data as possible. For an object to be distributed outward, it must be serialized and passed through memory or over a network. This involves three system resources: CPU utilization and memory in the server to serialize the object and possibly packetize it for travel across the network, network bandwidth or interprocess communication activity to actually transmit to the receiver, CPU utilization and memory in the receiver to (possibly) unpacketize, deserialize, and reconstruct the object graph. Hence, an object’s movement from server to receiver comes at a fairly high cost.
Use compression before sending data over a network
Data archival	Keep current most frequently used online data separate from old less frequently data. For this you may need to refractor UI.
Intelligent load distribution	Incoming HTTP requests redirect to the mirrored facilities based on some combination of available server and network capacity. This can be accomplished internally or by subscribing to one of the commercial providers who specialize in this type of service.
Graceful service degradation
Map reduce
Autonomy	The system is designed such that individual components can make decisions based on local information.
Failure tolerant:	The system considers the failure of components to be a normal mode of operation, and continues operation with no or minimal interruption.
Collocation	Reduce any overheads associated with fetching data required for a piece of work, by collocating the data and the code.
Caching	If the data and the code can't be collocated, cache the data to reduce the overhead of fetching it over and over again.
Remoting	Reduce the amount of time spent accessing remote services by, for example, making the interfaces more coarse-grained. It's also worth remembering that remote vs local is an explicit design decision not a switch and to consider the first law of distributed computing - do not distribute your objects.
load balancing	Spreading the load across many instances of system/subsystem/component for handling the requests.
Queuing
Batch processing	Achieve efficiencies of scale by processing batches of data, usually because the overhead of an operation is amortized across multiple request
Relax data constraint	Many different techniques and trade-offs with regards to the immediacy of processing / storing / access to data fall in this strategy

Building Blocks of Scalability


Load Balancers	Hardware load balancers Software load balancers
Messaging queues	MSMQ, MQSeries
Caches	Memcache, ehcache
Reverse proxies	Squid, Varnish
CDN	Akamai
NoSQL databases	MOngo
Embeded databases	SQL Express
Cloud	AWS, Azure
Language features	Concurrency, queues, locks, asynchronous, thread pools
Frameworks	Akka, Storm, Kafka

Building Blocks of a Scalable Architecture

Introduction

Principles

Simplicity

Decomposition

Asynchronous

Loose Coupling High Cohesion

Concurrency and Parallelization

Parsimony

Decentralization/Distribution

Rules

Antipatterns

Design Patterns

Building Blocks of Scalability

Popular

Recent content

Introduction

Principles

Simplicity

Decomposition

Asynchronous

Loose Coupling High Cohesion

Concurrency and Parallelization

Parsimony

Decentralization/Distribution

Rules

Antipatterns

Design Patterns

Building Blocks of Scalability

User login

Search

Popular

Recent content