- Published on
Latency vs Throughput - System Design Interview Trade-off Guide
Table of Contents
- ⚖️ Latency vs Throughput
- 🔷 Core Idea
- 🧠 Analogy
- 🧠 Script
- 💡 FAANG Q
- 🔷 2. Latency (Deep Dive)
- ✅ Key Points
- ⚠️ Trade-offs
- 📌 When to prioritize
- 💡 FAANG Q
- 🔷 3. Throughput (Deep Dive)
- ✅ Key Points
- ⚠️ Trade-offs
- 📌 When to prioritize
- 💡 FAANG Q
- 🔷 4. Comparison Table
- 🔷 5. Real-World Examples
- 🔷 6. Architecture Decisions
- For Low Latency:
- For High Throughput:
- 🧠 One-line Summary
- 🚀 Final Interview Answer
⚖️ Latency vs Throughput
🔷 Core Idea
- Latency = Time for one request to complete (ms)
- Throughput = Requests processed per second (RPS)
🧠 Analogy
🚗 Road:
- Latency → travel time for one car
- Throughput → cars passing per hour
🧠 Script
"Latency is how fast a single request completes. Throughput is how many requests the system handles per second."
💡 FAANG Q
Q: Can you always improve both latency and throughput simultaneously? A: Not always. Optimizing for one often trades off the other.
🔷 2. Latency (Deep Dive)
✅ Key Points
- Measured in ms (p50, p95, p99)
- Sources of latency:
- Network hops
- DB queries
- Serialization
- Queueing
⚠️ Trade-offs
- ✅ Fast user experience
- ❌ Hard to reduce at distributed scale
- ❌ Cross-region adds 100ms+
📌 When to prioritize
- Interactive apps (chat, trading)
- User-facing APIs
- Real-time systems
💡 FAANG Q
Q: How do you reduce latency? A:
- Caching (avoid DB hits)
- Async processing (non-critical paths)
- CDN (reduce network hops)
- Optimize DB indexes
🔷 3. Throughput (Deep Dive)
✅ Key Points
- Measured in RPS / QPS
- Bottlenecks:
- Single DB node
- CPU-bound processing
- Network bandwidth
⚠️ Trade-offs
- ✅ Handle massive scale
- ❌ May sacrifice per-request latency
- ❌ Requires horizontal scaling
📌 When to prioritize
- Batch systems
- Log ingestion
- Streaming pipelines
💡 FAANG Q
Q: How do you increase throughput? A:
- Horizontal scaling (add servers)
- Load balancing
- Async/batch processing
- Partitioning (sharding)
🔷 4. Comparison Table
| Factor | Latency Focus | Throughput Focus |
|---|---|---|
| Goal | Faster responses | More requests |
| Scaling | Vertical | Horizontal |
| Best for | Interactive apps | Batch systems |
| Trade-off | Lower throughput | Higher latency |
🔷 5. Real-World Examples
| System | Focus | Why |
|---|---|---|
| Trading app | Latency | Every ms counts |
| Log pipeline | Throughput | Volume > speed |
| Netflix | Both | CDN + async |
| Banking API | Latency | User trust |
🔷 6. Architecture Decisions
For Low Latency:
- Cache frequently accessed data
- Use CDN for static assets
- Minimize network round trips
- Optimize critical path queries
For High Throughput:
- Horizontal scaling
- Message queues (Kafka)
- Batch processing
- Database sharding
🧠 One-line Summary
"Latency is about speed for one user. Throughput is about scale for all users. Real systems need both."
🚀 Final Interview Answer
"Latency measures the time to complete a single request while throughput measures requests per second. Improving latency often uses caching, CDNs, and query optimization. Improving throughput uses horizontal scaling, sharding, and async processing. Modern systems optimize for both using tiered architectures — CDN for reads, load-balanced stateless services for throughput, and databases tuned for the access pattern."