Scale From Zero to Millions of Users #
Here, we’re building a system that supports a few users & gradually scales it to support millions.
Single Server Setup #
To start off, we’re going to put everything on a single server - web app, database, cache, etc.
What’s the Request Flow? #
- User asks DNS server for the IP of my site (i.e.
api.mysite.com -> 15.125.23.214
). Usually, DNS is provided by third-parties instead of hosting it yourself. - HTTP requests are sent directly to server (via its IP) from your device.
- Server returns HTML pages or JSON payloads, used for rendering.
Traffic to the web server comes from either a web application or a mobile application:
- Web applications use a combo of server-side languages (i.e. Java, Python) to handle business logic & storage. Client-side languages (i.e. HTML, JS) are used for presentation.
- Mobile apps use the HTTP protocol for communication between mobile & the web server. JSON is used for formatting transmitted data.
Database #
As the user base grows, storing everything on a single server is insufficient. We can separate our database on another server so that it can be scaled independently from the web tier.
Which Databases to Use? #
You can choose either a traditional relational database or a non-relational (NoSQL) one.
- Most popular relational DBs: MySQL, Oracle, PostgreSQL.
- Most popular NoSQL DBs: CouchDB, Neo4J, Cassandra, HBase, DynamoDB.
Relational databases represent & store data in tables & rows. You can join different tables to represent aggregate objects. NoSQL databases are grouped into four categories: key-value stores, graph stores, column stores & document stores.
For most use cases, relational databases are the best option. If not suitable, explore NoSQL databases, which might be better if:
- Application requires super-low latency.
- Data is unstructured or you don’t need any relational data.
- You only need to serialize/deserialize data (JSON, XML, YAML, etc.).
- You need to store a massive amount of data.
Vertical Scaling vs. Horizontal Scaling #
- Vertical scaling (scale up): Adding more power to your servers (CPU, RAM, etc.).
- Horizontal scaling (scale out): Adding more servers to your pool of resources.
Advantages of Horizontal Scaling #
- Can handle larger traffic volumes.
- Avoids the hard limits of vertical scaling.
- Provides failover and redundancy.
Load Balancer #
A load balancer evenly distributes incoming traffic among web servers in a load-balanced set.
How It Works? #
- If one server goes down, all traffic is routed to another server.
- More servers can be added to handle spikes in traffic.
Database Replication #
Database replication is usually achieved via master/slave replication (nowadays called primary/secondary replication). A master database supports writes, while slave databases store copies and support read operations.
Advantages #
- Better performance: Enables more read queries to be processed in parallel.
- Reliability: If one database fails, data is still preserved.
- High availability: Data remains accessible as long as one instance is operational.
If a master or slave database goes offline, the system promotes a new master and adjusts slaves accordingly.
Updated Request Lifecycle #
- User gets the IP address of the load balancer from DNS.
- User connects to the load balancer via IP.
- HTTP request is routed to server 1 or server 2.
- Web server reads user data from a slave database or routes data modifications to the master database.
Cache #
A cache is a temporary storage that stores frequently accessed data or results of expensive computations. In our web application, caching can reduce the need for expensive database queries.
Cache Tier #
The cache tier is a temporary storage layer that can be scaled independently from the database.
Considerations for Using Cache #
- When to use: Useful when data is read frequently but modified infrequently.
- Expiration policy: Controls when cached data expires. Too short leads to frequent DB queries, too long risks stale data.
- Consistency: Ensures the cache is in sync with the database.
- Mitigating failures: Provision servers with extra memory or set them up in multiple locations.
- Eviction policy: Determines what happens when the cache is full. Common policies are LRU, LFU, FIFO.
Content Delivery Network (CDN) #
A CDN is a network of geographically dispersed servers, used for delivering static content (images, HTML, CSS, JS files).
Whenever a user requests static content, the CDN server closest to the user serves it.
Considerations for Using CDN #
- Cost: CDNs are managed by third parties, so be mindful of cost.
- Cache expiry: Setting an appropriate cache expiry to balance request frequency and data staleness.
- CDN fallback: Clients should have a way to fall back if there’s a temporary outage.
- Invalidation: Use APIs or object versioning to invalidate cache.
Stateless Web Tier #
To scale our web tier, we need to make it stateless by storing session data in persistent storage (relational database or NoSQL).
Stateful Architecture #
Stateful servers remember client data across different requests, making them less flexible.
Stateless Architecture #
Stateless servers don’t store user data, allowing HTTP requests to be served by any server.
Data Centers #
Clients are geo-routed to the nearest data center based on their IP address.
In the event of an outage, traffic is rerouted to a healthy data center.
Message Queues #
Message queues enable asynchronous communication and decouple producers from consumers.
Example use-case: Photo processing tasks.
Logging, Metrics, and Automation #
As the web application grows, monitoring tooling becomes essential.
- Logging: Error logs can be emitted to a data store.
- Metrics: Collect various types of metrics to monitor system health.
- Automation: Continuous integration and deployment detect problems early and improve productivity.
Database Scaling #
There are two approaches to database scaling:
Vertical Scaling #
Adding more resources (CPU, RAM, etc.) to your database nodes. This approach has hardware limits and can be expensive.
Horizontal Scaling #
Add more database nodes instead of upgrading a single one. Sharding is a common horizontal scaling technique.
In this setup, data is distributed across shards using a partition key.
Millions of Users and Beyond #
Scaling a system is iterative. Here are key takeaways:
- Keep the web tier stateless.
- Build redundancy at every layer.
- Cache frequently accessed data.
- Support multiple data centers.
- Host static assets in CDNs.
- Scale your data tier via sharding.
- Split your big application into multiple services.
- Monitor your system & use automation.