Logo
Published on

Latency vs Throughput - System Design Interview Trade-off Guide

⚖️ Latency vs Throughput


🔷 Core Idea

  • Latency = Time for one request to complete (ms)
  • Throughput = Requests processed per second (RPS)

🧠 Analogy

🚗 Road:

  • Latency → travel time for one car
  • Throughput → cars passing per hour

🧠 Script

"Latency is how fast a single request completes. Throughput is how many requests the system handles per second."


💡 FAANG Q

Q: Can you always improve both latency and throughput simultaneously? A: Not always. Optimizing for one often trades off the other.


🔷 2. Latency (Deep Dive)

✅ Key Points

  • Measured in ms (p50, p95, p99)
  • Sources of latency:
    • Network hops
    • DB queries
    • Serialization
    • Queueing

⚠️ Trade-offs

  • ✅ Fast user experience
  • ❌ Hard to reduce at distributed scale
  • ❌ Cross-region adds 100ms+

📌 When to prioritize

  • Interactive apps (chat, trading)
  • User-facing APIs
  • Real-time systems

💡 FAANG Q

Q: How do you reduce latency? A:

  • Caching (avoid DB hits)
  • Async processing (non-critical paths)
  • CDN (reduce network hops)
  • Optimize DB indexes

🔷 3. Throughput (Deep Dive)

✅ Key Points

  • Measured in RPS / QPS
  • Bottlenecks:
    • Single DB node
    • CPU-bound processing
    • Network bandwidth

⚠️ Trade-offs

  • ✅ Handle massive scale
  • ❌ May sacrifice per-request latency
  • ❌ Requires horizontal scaling

📌 When to prioritize

  • Batch systems
  • Log ingestion
  • Streaming pipelines

💡 FAANG Q

Q: How do you increase throughput? A:

  • Horizontal scaling (add servers)
  • Load balancing
  • Async/batch processing
  • Partitioning (sharding)

🔷 4. Comparison Table

Factor Latency Focus Throughput Focus
Goal Faster responses More requests
Scaling Vertical Horizontal
Best for Interactive apps Batch systems
Trade-off Lower throughput Higher latency

🔷 5. Real-World Examples

System Focus Why
Trading app Latency Every ms counts
Log pipeline Throughput Volume > speed
Netflix Both CDN + async
Banking API Latency User trust

🔷 6. Architecture Decisions

For Low Latency:

  • Cache frequently accessed data
  • Use CDN for static assets
  • Minimize network round trips
  • Optimize critical path queries

For High Throughput:

  • Horizontal scaling
  • Message queues (Kafka)
  • Batch processing
  • Database sharding

🧠 One-line Summary

"Latency is about speed for one user. Throughput is about scale for all users. Real systems need both."


🚀 Final Interview Answer

"Latency measures the time to complete a single request while throughput measures requests per second. Improving latency often uses caching, CDNs, and query optimization. Improving throughput uses horizontal scaling, sharding, and async processing. Modern systems optimize for both using tiered architectures — CDN for reads, load-balanced stateless services for throughput, and databases tuned for the access pattern."