- Published on
Redundancy and Replication - FAANG System Design Interview Guide
Table of Contents
- 🔷 1. Redundancy (Core Idea)
- 🧠 Definition (Interview Opening)
- ⚡ Why it matters
- 🎯 Analogy
- 🎤 FAANG Question
- 🚨 2. Single Point of Failure (SPOF)
- Example
- 🎤 FAANG Question
- 🔷 3. Replication (Core Concept)
- 🧠 Definition
- 🔧 Basic Flow
- ⚡ Why Replication?
- 🎤 FAANG Question
- ⚙️ 4. Types of Replication (VERY IMPORTANT)
- 🟢 1. Synchronous Replication (Strong Consistency)
- 🧠 Idea
- ✅ Pros
- ❌ Cons
- 🎯 Analogy
- 🎤 FAANG Question
- 🔵 2. Asynchronous Replication (Eventual Consistency)
- 🧠 Idea
- ✅ Pros
- ❌ Cons
- 🎯 Analogy
- 🎤 FAANG Question
- 🟡 3. Semi-Synchronous (Balanced 🔥)
- 🧠 Idea
- ✅ Pros
- 🎤 FAANG Question
- ⚖️ 5. Comparison (Must Remember)
- 🔥 6. Missing but CRITICAL Concepts
- 🧠 A. Read vs Write Scaling
- 🧠 B. Replica Lag (VERY IMPORTANT)
- Problem:
- 🎤 FAANG Question
- 🧠 C. Failover (VERY IMPORTANT)
- Types:
- 🎤 FAANG Question
- 🧠 D. Multi-Leader / Leaderless (ADVANCED 🔥)
- Single Leader (most common)
- Multi-leader
- Leaderless (e.g., DynamoDB)
- 🎤 FAANG Question
- 🧠 E. CAP Theorem Connection 🔥
- ⚠️ 7. Trade-offs
- 🚀 8. Real System Example (FAANG Thinking)
- Example: Instagram
- 🎤 9. FAANG Interview Script
- Start
- Explain
- Trade-off
- Add Depth
- Failure Handling
- Close Strong
- 🧠 Final One-Line (Must Memorize)
- 💡 FAANG-Level Insight (DIFFERENTIATOR)
🔷 1. Redundancy (Core Idea)
🧠 Definition (Interview Opening)
"Redundancy is duplicating critical components to eliminate single points of failure and improve reliability."
⚡ Why it matters
- Prevent data loss
- Ensure system keeps running
- Improve availability
🎯 Analogy
💾 Google Docs:
- Your doc saved on multiple servers → one server fails → no data loss
🎤 FAANG Question
Q: Why is redundancy important? A:
"It removes single points of failure and ensures high availability during failures."
🚨 2. Single Point of Failure (SPOF)
👉 Anything whose failure = system down
Example
- One DB → crash = system down ❌
- Replicated DB → failover ✅
🎤 FAANG Question
Q: How do you remove SPOF? A:
"By introducing redundancy via replication, multiple instances, and failover mechanisms."
🔷 3. Replication (Core Concept)
🧠 Definition
"Replication is copying data from a primary node to one or more replicas to improve availability, fault tolerance, and scalability."
🔧 Basic Flow
Primary (write) → Replicas (read)
⚡ Why Replication?
- High availability
- Read scaling
- Disaster recovery
🎤 FAANG Question
Q: Why use replication? A:
"To improve availability, scale reads, and ensure durability of data."
⚙️ 4. Types of Replication (VERY IMPORTANT)
🟢 1. Synchronous Replication (Strong Consistency)
🧠 Idea
"Write succeeds only after replicas confirm"
✅ Pros
- Strong consistency
- No data loss
❌ Cons
- High latency
- Slower writes
🎯 Analogy
✍️ Writing + photocopy instantly before submitting
🎤 FAANG Question
Q: When use synchronous replication? A:
"When consistency is critical, like banking systems."
🔵 2. Asynchronous Replication (Eventual Consistency)
🧠 Idea
"Primary writes first, replicas update later"
✅ Pros
- Fast writes
- High availability
❌ Cons
- Data lag
- Possible data loss
🎯 Analogy
📩 WhatsApp message → delivered later
🎤 FAANG Question
Q: Risk of async replication? A:
"Replica lag can cause stale reads or data loss on failure."
🟡 3. Semi-Synchronous (Balanced 🔥)
🧠 Idea
"Wait for at least one replica, others async"
✅ Pros
- Better consistency than async
- Better performance than sync
🎤 FAANG Question
Q: Why semi-sync is used? A:
"To balance consistency and latency in production systems."
⚖️ 5. Comparison (Must Remember)
| Type | Consistency | Latency | Data Loss |
|---|---|---|---|
| Sync | Strong | High | No |
| Async | Weak | Low | Possible |
| Semi-sync | Medium | Medium | Low |
🔥 6. Missing but CRITICAL Concepts
🧠 A. Read vs Write Scaling
- Writes → Primary
- Reads → Replicas
👉 "Scale reads horizontally"
🧠 B. Replica Lag (VERY IMPORTANT)
👉 Delay between primary & replica
Problem:
- User sees stale data
🎤 FAANG Question
Q: How handle stale reads? A:
"Use read-after-write consistency or route critical reads to primary."
🧠 C. Failover (VERY IMPORTANT)
👉 When primary fails:
- Promote replica → new primary
Types:
- Manual
- Automatic (preferred)
🎤 FAANG Question
Q: What happens when primary fails? A:
"A replica is promoted to primary via failover mechanisms."
🧠 D. Multi-Leader / Leaderless (ADVANCED 🔥)
Single Leader (most common)
- One write node
Multi-leader
- Multiple writable nodes
Leaderless (e.g., DynamoDB)
- No primary
🎤 FAANG Question
Q: Why not always multi-leader? A:
"Because it introduces conflict resolution complexity."
🧠 E. CAP Theorem Connection 🔥
- Sync → CP (consistency)
- Async → AP (availability)
⚠️ 7. Trade-offs
| Benefit | Cost |
|---|---|
| High availability | Complexity |
| Fault tolerance | Replication lag |
| Read scaling | Consistency issues |
🚀 8. Real System Example (FAANG Thinking)
Example: Instagram
- Primary DB → writes
- Replicas → serve feed
👉 High read traffic → replicas handle load
🎤 9. FAANG Interview Script
Start
"To improve availability and scalability, I'll introduce database replication."
Explain
"Writes go to primary, and replicas serve read traffic."
Trade-off
"Async replication improves performance but introduces eventual consistency."
Add Depth
"We can use semi-sync to balance latency and consistency."
Failure Handling
"In case of failure, a replica will be promoted via failover."
Close Strong
"This removes single point of failure and enables horizontal scaling."
🧠 Final One-Line (Must Memorize)
"Replication improves availability and read scalability by duplicating data across nodes."
💡 FAANG-Level Insight (DIFFERENTIATOR)
"Replication is not just about backup — it's about scaling reads and surviving failures without downtime."