Caching Strategies: The Systematic Way to Make Your App 10x Faster

Amazon has research showing that a 100ms increase in page load time drops sales by 1%. Google's research shows a 0.5-second delay reduces search traffic by 20%.

Speed directly impacts business outcomes. And the most economical way to speed things up is usually not more servers — it's smarter caching.

What Is Cache and Why Is It Necessary?

A cache stores frequently accessed data somewhere faster to access. Instead of going to the original source every time, requests are served from a nearby copy.

Why necessary? Because the speed difference between layers is dramatic:

code
Memory (RAM) access:           ~100 nanoseconds
SSD disk read:                 ~100 microseconds  (1,000x slower)
Network request (same region): ~1 millisecond     (10,000x slower)
Database query:                ~10 milliseconds   (100,000x slower)
External API call:             ~100 milliseconds  (1,000,000x slower)

Caching a database query result in memory can be 100,000 times faster than running that query every time.

Layer 1: Browser Cache

The cheapest cache is the one in the user's browser. No request even reaches the server.

Controlled via HTTP response headers:

http
Cache-Control: max-age=86400, public
ETag: "abc123def456"
Last-Modified: Wed, 05 Mar 2025 10:00:00 GMT

Cache-Control directives:

max-age=86400 — cache this resource for 86400 seconds (1 day). Within this period, the browser won't contact the server, it uses the cache directly.

public — both browser and intermediate proxies can cache. Ideal for static assets.

private — only the user's browser can cache. For personal data.

no-store — never cache. For sensitive financial data.

must-revalidate — verify with server before using when cache expires.

ETag with conditional requests:

code
First request:
Browser → GET /style.css
Server  ← 200 OK, ETag: "v2.1"

Second request (cache expired):
Browser → GET /style.css, If-None-Match: "v2.1"
Server  ← 304 Not Modified  (file unchanged, no re-download)

A 304 response carries no body — just headers. Zero bandwidth, minimum latency.

Versioning strategy: Add a hash to static file URLs. Like style.abc123.css. You can set max-age very long because when the file changes, the URL changes — the cache is automatically invalidated.

Layer 2: CDN Cache

A CDN (Content Delivery Network) caches content on servers geographically close to the user. A user in London gets served from a Frankfurt CDN node instead of going all the way to your origin server in the US.

The impact is dramatic: 200ms latency can drop to 10ms. Static files, images, video — serving all of these from a CDN both speeds things up and reduces the load on your origin server.

How do you control CDN cache?

http
Cache-Control: public, max-age=31536000, s-maxage=86400

s-maxage only affects CDNs and proxies. The browser looks at max-age, the CDN looks at s-maxage.

http
Surrogate-Control: max-age=3600
Cache-Tag: product-123, category-electronics

With Cache-Tag, you can instantly invalidate specific content groups. Product updated? Delete all CDN cache entries carrying the product-123 tag.

Layer 3: Application Cache (Redis)

Redis is used to cache the results of database queries, external API calls, and computed operations at the application layer.

Cache-Aside (Lazy Loading) — The Most Common Pattern:

python
def get_product(product_id):
    cache_key = f"product:{product_id}"

    # 1. Check cache first
    cached = redis.get(cache_key)
    if cached:
        return json.loads(cached)

    # 2. Not in cache, get from DB
    product = db.query("SELECT * FROM products WHERE id = %s", product_id)

    # 3. Write to cache (1 hour TTL)
    redis.setex(cache_key, 3600, json.dumps(product))

    return product

Advantage: only data that's actually requested gets cached. Disadvantage: first request is always slow (cache miss).

Write-Through — Cache and DB Always in Sync:

python
def update_product(product_id, data):
    # Update both DB and cache
    db.execute("UPDATE products SET ... WHERE id = %s", product_id)
    redis.setex(f"product:{product_id}", 3600, json.dumps(data))

Cache is always current. But unnecessary writes for frequently updated data.

Cache Warming — Preventing Cold Start:

When the application restarts or a new node is added, the cache is empty. During this period, all requests go to the DB — cache stampede risk. Solution: preload critical data into cache.

python
def warm_cache():
    popular_products = db.query(
        "SELECT * FROM products ORDER BY view_count DESC LIMIT 1000"
    )
    for product in popular_products:
        redis.setex(f"product:{product.id}", 3600, json.dumps(product))

Cache Invalidation: The Hardest Problem

There are two hard problems in computer science: cache invalidation and naming things.

What happens when data in the cache changes? Old data keeps being served — the stale data problem.

TTL (Time To Live): The simplest approach. Give the cache entry a duration, let it expire automatically. But old data is served until TTL expires after the data is updated.

Event-Based Invalidation: Actively delete relevant cache entries when data changes.

python
def update_user_profile(user_id, data):
    db.execute("UPDATE users SET ... WHERE id = %s", user_id)

    # Clear all related cache keys
    redis.delete(f"user:{user_id}")
    redis.delete(f"user:{user_id}:profile")
    redis.delete(f"user:{user_id}:orders")

Cache Versioning: Add a version to cache keys. Increment the version when data changes.

python
cache_version = redis.get("cache:version:products") or 1
cache_key = f"product:{product_id}:v{cache_version}"

To invalidate the entire product cache, just increment the version number — old keys will never be queried again and will be deleted when their TTL expires.

What Should You Not Cache?

Cache doesn't solve every problem and not every piece of data is suitable for caching.

Data that must be real-time and current shouldn't be cached: stock prices, live location data, real-time inventory counts. Data that is user-specific and changes frequently won't benefit from cache. For data that's cheap to compute, the overhead of a cache can exceed the gain.

Cache Size and Eviction Policies

When Redis runs out of memory, old data must be deleted to add new data. Which data gets deleted?

LRU (Least Recently Used) — data that hasn't been accessed for the longest time is deleted. Good default for most scenarios.

LFU (Least Frequently Used) — least accessed data is deleted. Guarantees popular data stays in memory.

TTL-based — data with the soonest expiry is deleted first.

Set in Redis config with maxmemory-policy allkeys-lru.

When done right, caching makes your application faster, your database less loaded, and your infrastructure costs lower. When done wrong, it produces stale data, cache stampedes, and hard-to-debug inconsistencies. The difference lies in which data you cache, where, for how long, and with what invalidation strategy.