FroquizFroquiz
HomeQuizzesSenior ChallengeGet CertifiedBlogAbout
Sign InStart Quiz
Sign InStart Quiz
Froquiz

The most comprehensive quiz platform for software engineers. Test yourself with 10000+ questions and advance your career.

LinkedIn

Platform

  • Start Quizzes
  • Topics
  • Blog
  • My Profile
  • Sign In

About

  • About Us
  • Contact

Legal

  • Privacy Policy
  • Terms of Service

Β© 2026 Froquiz. All rights reserved.Built with passion for technology
Blog & Articles

System Design Fundamentals: Scalability, Load Balancing, Caching and Databases

Master system design fundamentals for technical interviews. Covers horizontal vs vertical scaling, load balancers, caching strategies, database sharding, CAP theorem, and designing for availability.

Yusuf SeyitoğluMarch 12, 20262 views13 min read

System Design Fundamentals: Scalability, Load Balancing, Caching and Databases

System design interviews evaluate your ability to architect large-scale distributed systems. Unlike coding interviews with clear right/wrong answers, system design is about reasoning through trade-offs. This guide covers the building blocks and concepts that appear in almost every system design discussion.

How to Approach System Design Interviews

A good framework:

  1. Clarify requirements β€” functional (what it does) and non-functional (scale, latency, availability)
  2. Estimate scale β€” DAU, requests per second, data volume
  3. High-level design β€” draw the major components
  4. Deep dive β€” detail the components your interviewer cares about
  5. Identify bottlenecks β€” what breaks at scale? how do you fix it?

Vertical vs Horizontal Scaling

Vertical scaling (scaling up): Give the server more RAM, CPU, or faster disks.

  • Simple β€” no code changes
  • Hard limit β€” you cannot scale a single machine forever
  • Single point of failure

Horizontal scaling (scaling out): Add more servers.

  • No hard limit β€” add servers as needed
  • Requires stateless application design
  • More complex β€” load balancing, session management, data consistency

Modern systems scale horizontally. Your application servers should be stateless β€” any request can be handled by any server. Store session data in Redis, not in-process memory.

Load Balancers

A load balancer distributes incoming traffic across multiple servers.

code
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” Client ────────────►│ Load Balancer β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β–Ό β–Ό β–Ό Server 1 Server 2 Server 3

Load balancing algorithms

  • Round robin β€” distribute requests evenly in sequence (good for uniform workloads)
  • Least connections β€” send to the server with fewest active connections (good for variable request durations)
  • IP hash β€” hash client IP to a consistent server (useful for session stickiness)
  • Weighted round robin β€” distribute proportionally to server capacity

Layer 4 vs Layer 7

  • L4 (transport layer) β€” routes based on IP/TCP, very fast, no content inspection
  • L7 (application layer) β€” routes based on HTTP headers, URL, cookies β€” can do content-aware routing (send /api/* to API servers, /static/* to CDN)

AWS ALB, Nginx, and HAProxy are L7 load balancers. AWS NLB is L4.

Caching

Caching stores copies of frequently accessed data closer to the requester, reducing latency and database load.

Cache-Aside (Lazy Loading)

code
App β†’ Check cache β†’ Hit: return cached data β†’ Miss: query DB β†’ store in cache β†’ return data

Most common pattern. Cache only what is actually requested.

Write-Through

Write to cache and DB simultaneously. Cache is always fresh but adds write latency.

Write-Behind (Write-Back)

Write to cache immediately, flush to DB asynchronously. Low write latency but risk of data loss if cache goes down before flush.

Cache Eviction Policies

  • LRU (Least Recently Used) β€” evict the item not accessed for the longest time
  • LFU (Least Frequently Used) β€” evict the item accessed least often
  • TTL (Time to Live) β€” expire items after a fixed duration

CDN (Content Delivery Network)

CDNs cache static assets (images, JS, CSS) at edge servers geographically close to users:

code
User (Tokyo) ──► CDN Edge (Tokyo) ──► Origin Server (US) [on miss] β”‚ cache hit: 5ms vs 150ms round trip to origin

Use a CDN for all static assets, and for cacheable API responses when possible.

Cache Invalidation

The hardest problem in caching. Strategies:

  • TTL-based expiry β€” simple, stale data is possible
  • Event-driven invalidation β€” when data changes, explicitly delete/update cache keys
  • Cache-aside with short TTL β€” accept eventual consistency

Database Scaling

Read Replicas

Most web applications read far more than they write. Add read replicas to distribute read load:

code
Writes ──────────────────────────────► Primary DB β”‚ replication β”‚ β”‚ β–Ό β–Ό Reads ───────────────────────► Replica 1 Replica 2

Tradeoff: replication is asynchronous β€” replicas may be slightly behind (replication lag).

Database Sharding

Horizontally partition data across multiple database instances. Each shard holds a subset of the data.

Hash sharding: shard = hash(user_id) % num_shards

code
User 1 ──► Shard 0 User 2 ──► Shard 1 User 3 ──► Shard 0 User 4 ──► Shard 2

Range sharding: Users A–H go to Shard 0, I–P to Shard 1, Q–Z to Shard 2.

Sharding enables near-unlimited horizontal scale but complicates cross-shard queries and transactions.

SQL vs NoSQL for Scale

SQL databases (PostgreSQL, MySQL) are strongly consistent and support complex queries. They scale well with read replicas and vertical scaling, but horizontal sharding is complex.

NoSQL databases (DynamoDB, Cassandra, MongoDB) are designed for horizontal scaling from the ground up. They trade ACID guarantees and join support for massive throughput and availability.

Choose based on your access patterns:

  • Complex queries, strong consistency β†’ SQL
  • High write throughput, simple lookups by key β†’ NoSQL
  • Flexible schema, document model β†’ MongoDB
  • Time-series data β†’ TimescaleDB or InfluxDB

CAP Theorem

In a distributed system, you can only guarantee two of three:

  • Consistency (C) β€” every read returns the most recent write
  • Availability (A) β€” every request receives a response (not guaranteed to be current)
  • Partition Tolerance (P) β€” system works even if some nodes cannot communicate

Network partitions happen in real distributed systems β€” you cannot avoid P. So the real choice is CP vs AP:

  • CP systems (prefer consistency): ZooKeeper, HBase β€” during a partition, they reject some requests
  • AP systems (prefer availability): Cassandra, DynamoDB β€” during a partition, they may return stale data

For most web apps, AP with eventual consistency is acceptable (showing a slightly stale tweet count is fine). For financial transactions, CP is required (cannot show wrong account balance).

Availability and Reliability

Availability is usually expressed in "nines":

AvailabilityDowntime per year
99% (2 nines)3.65 days
99.9% (3 nines)8.77 hours
99.99% (4 nines)52 minutes
99.999% (5 nines)5.26 minutes

Patterns for high availability

Redundancy: No single points of failure. Multiple app servers, database replicas, multi-AZ deployments.

Health checks and auto-healing: Load balancers stop sending traffic to unhealthy instances. Auto-scaling groups replace failed instances automatically.

Circuit breaker: Prevent cascading failures. When a downstream service fails repeatedly, stop calling it for a period and return a fallback.

Graceful degradation: If a non-critical component fails (recommendations service, analytics), continue serving core functionality.

Retry with exponential backoff: Automatically retry transient failures with increasing delays.

Message Queues

Message queues decouple producers from consumers and enable async processing:

code
Web Server ──► Message Queue ──► Worker Service (fast) (Kafka/SQS) (slow processing)

Benefits:

  • Producer does not block waiting for slow processing
  • Workers can scale independently from the web tier
  • Messages are durable β€” not lost if a worker crashes
  • Buffer traffic spikes β€” queue absorbs bursts, workers process at their own pace

Use cases: sending emails, processing images, generating reports, triggering webhooks, propagating events between microservices.

Back-of-Envelope Estimation

Interviewers expect you to estimate scale. Useful numbers to memorize:

OperationApproximate time
L1 cache reference0.5 ns
RAM reference100 ns
SSD random read100 Β΅s
Network round trip (same datacenter)0.5 ms
Network round trip (cross-continent)150 ms
HDD seek10 ms

Traffic math: 1M DAU Γ— 10 requests/day = ~116 requests/second. A single well-tuned server handles thousands of req/s for simple APIs.

Storage math: 1M users Γ— 1KB profile = 1GB. 100M photos Γ— 1MB = 100TB.

Example: Design a URL Shortener

Quick walkthrough of a common interview question:

Functional requirements: Create short URLs, redirect short β†’ long URL.

Non-functional: 100M URLs created/day, 10B redirects/day (read-heavy, 100:1 read/write).

Core components:

code
Client ──► Load Balancer ──► API Servers ──► Cache (Redis) β”‚ (miss) β–Ό Database (short_url β†’ long_url)

Data model:

code
{ short_code: "abc123", long_url: "https://...", created_at, user_id }

Short code generation: Base62 encode a counter, or take first 6 chars of MD5(long_url). Handle collisions.

Scale: 10B redirects/day = ~115,000 req/s. Cache hot short codes in Redis (99%+ of traffic served from cache). Database is write-only from the hot path.

Practice on Froquiz

System design concepts appear in senior developer and staff engineer interviews. Explore our backend and infrastructure quizzes on Froquiz to reinforce the fundamentals.

Summary

  • Vertical scaling is simple but limited; horizontal scaling requires stateless app design
  • Load balancers distribute traffic; L7 LBs can route by URL/header
  • Cache-aside is the most common caching pattern; always set a TTL
  • Read replicas scale reads; sharding scales writes β€” both add complexity
  • CAP theorem: network partitions are inevitable β€” choose CP or AP based on your consistency needs
  • Message queues decouple services and absorb traffic spikes
  • Always clarify requirements and estimate scale before drawing architecture in interviews

About Author

Yusuf Seyitoğlu

Author β†’

Other Posts

  • GraphQL Schema Design: Types, Resolvers, Mutations and Best PracticesMar 12
  • CSS Advanced Techniques: Custom Properties, Container Queries, Grid Masonry and Modern LayoutsMar 12
  • Java Collections Deep Dive: ArrayList, HashMap, TreeMap, LinkedHashMap and When to Use EachMar 12
All Blogs