⚖️ Latency vs Throughput

🔷 Core Idea

Latency = Time for one request to complete (ms)
Throughput = Requests processed per second (RPS)

🧠 Analogy

🚗 Road:

Latency → travel time for one car
Throughput → cars passing per hour

🧠 Script

"Latency is how fast a single request completes. Throughput is how many requests the system handles per second."

💡 FAANG Q

Q: Can you always improve both latency and throughput simultaneously? A: Not always. Optimizing for one often trades off the other.

🔷 2. Latency (Deep Dive)

✅ Key Points

Measured in ms (p50, p95, p99)
Sources of latency:
- Network hops
- DB queries
- Serialization
- Queueing

⚠️ Trade-offs

✅ Fast user experience
❌ Hard to reduce at distributed scale
❌ Cross-region adds 100ms+

📌 When to prioritize

Interactive apps (chat, trading)
User-facing APIs
Real-time systems

💡 FAANG Q

Q: How do you reduce latency? A:

Caching (avoid DB hits)
Async processing (non-critical paths)
CDN (reduce network hops)
Optimize DB indexes

🔷 3. Throughput (Deep Dive)

✅ Key Points

Measured in RPS / QPS
Bottlenecks:
- Single DB node
- CPU-bound processing
- Network bandwidth

⚠️ Trade-offs

✅ Handle massive scale
❌ May sacrifice per-request latency
❌ Requires horizontal scaling

📌 When to prioritize

Batch systems
Log ingestion
Streaming pipelines

💡 FAANG Q

Q: How do you increase throughput? A:

Horizontal scaling (add servers)
Load balancing
Async/batch processing
Partitioning (sharding)

🔷 4. Comparison Table

Factor	Latency Focus	Throughput Focus
Goal	Faster responses	More requests
Scaling	Vertical	Horizontal
Best for	Interactive apps	Batch systems
Trade-off	Lower throughput	Higher latency

🔷 5. Real-World Examples

System	Focus	Why
Trading app	Latency	Every ms counts
Log pipeline	Throughput	Volume > speed
Netflix	Both	CDN + async
Banking API	Latency	User trust

🔷 6. Architecture Decisions

For Low Latency:

Cache frequently accessed data
Use CDN for static assets
Minimize network round trips
Optimize critical path queries

For High Throughput:

Horizontal scaling
Message queues (Kafka)
Batch processing
Database sharding

🧠 One-line Summary

"Latency is about speed for one user. Throughput is about scale for all users. Real systems need both."

🚀 Final Interview Answer

"Latency measures the time to complete a single request while throughput measures requests per second. Improving latency often uses caching, CDNs, and query optimization. Improving throughput uses horizontal scaling, sharding, and async processing. Modern systems optimize for both using tiered architectures — CDN for reads, load-balanced stateless services for throughput, and databases tuned for the access pattern."

Latency vs Throughput - System Design Interview Trade-off Guide

Table of Contents

⚖️ Latency vs Throughput

🔷 Core Idea

🧠 Analogy

🧠 Script

💡 FAANG Q

🔷 2. Latency (Deep Dive)

✅ Key Points

⚠️ Trade-offs

📌 When to prioritize

💡 FAANG Q

🔷 3. Throughput (Deep Dive)

✅ Key Points

⚠️ Trade-offs

📌 When to prioritize

💡 FAANG Q

🔷 4. Comparison Table

🔷 5. Real-World Examples

🔷 6. Architecture Decisions

For Low Latency:

For High Throughput:

🧠 One-line Summary

🚀 Final Interview Answer