1. Problem with Naive Hashing (mod N)

✅ Approach

server = hash(key) % N

❌ Problems

When N changes (node added/removed) → ALL keys remapped
Causes massive data movement

👉 Analogy: Changing number of buckets → all items reshuffled

🔥 FAANG Question

Q: Why is modulo-based sharding bad for scaling? A: Because adding/removing a node changes N, forcing re-hashing of almost all keys (~100%)

🧠 Script

"Modulo hashing fails in dynamic systems because any node change causes full data reshuffling."

2. Consistent Hashing (Core Idea)

✅ Key Idea

Map nodes + keys onto a ring (0 → 2³²)
Each key goes to next clockwise node

✅ Benefits

Only small portion of data moves when nodes change

👉 Analogy: Circular pizza → add/remove slice → only nearby slices affected

🔥 FAANG Question

Q: How does consistent hashing reduce rebalancing? A: Only keys belonging to affected node's range move (~1/N), not all keys

🧠 Script

"Consistent hashing maps data on a ring and ensures minimal data movement during scaling."

3. How It Works (Tokens & Ranges)

✅ Key Points

Each node has a token (position on ring)
Responsible for range:

[token, next_token)

Key → hash → falls into a range → assigned node

🔥 FAANG Question

Q: How do you locate a node for a key? A: Hash the key → move clockwise → first node found

🧠 Script

"Hash the key, move clockwise on the ring, and assign it to the first node."

4. Problem in Basic Consistent Hashing

❌ Issues

Uneven data distribution
Hotspots (one node overloaded)
Hard to scale efficiently

🔥 FAANG Question

Q: Why does basic consistent hashing create hotspots? A: Because nodes own large uneven ranges → skewed data distribution

🧠 Script

"Single token per node leads to uneven load and hotspots."

5. Virtual Nodes (Vnodes) — MOST IMPORTANT

✅ Key Idea

Each physical node has multiple small ranges (vnodes)
Instead of 1 token → many tokens

✅ Benefits

Better load balancing
Faster rebalancing
No hotspots
Supports heterogeneous machines
Parallel node recovery

🔥 FAANG Question

Q: Why do systems like Cassandra use virtual nodes? A: To evenly distribute load and minimize hotspots by splitting ranges into smaller chunks

🧠 Script

"Virtual nodes divide data into smaller chunks, improving load balancing and scaling efficiency."

🔷 7. Replication in Consistent Hashing

✅ Key Idea

Each key stored on N nodes (Replication Factor)
Coordinator = first node
Replicas = next clockwise nodes

✅ Types

Synchronous → strong consistency
Asynchronous → eventual consistency (common)

🔥 FAANG Question

Q: How is replication done in consistent hashing? A: Data is stored on primary node + replicated to next N-1 nodes clockwise

🧠 Script

"Data is stored on the coordinator and replicated to successor nodes for fault tolerance."

🔷 8. Eventual Consistency

✅ Key Idea

Data may be temporarily inconsistent
Eventually becomes consistent

👉 Trade-off:

High availability > strict consistency

🔥 FAANG Question

Q: Why do systems choose eventual consistency? A: To maintain availability during failures (CAP theorem)

🧠 Script

"Eventual consistency sacrifices immediate consistency for high availability."

🔷 9. Real-World Use Cases

✅ Used in:

Distributed DBs → Apache Cassandra
Key-value stores → Amazon DynamoDB
Caching systems (Redis clusters, CDN)

🔥 FAANG Question

Q: Where would you use consistent hashing? A: In systems needing dynamic scaling like caching, databases, and CDNs

🧠 Script

"Consistent hashing is used in databases, caches, and CDNs for scalable data distribution."

🔷 10. Interview Gold Points (Often Missed)

⭐ Add these to stand out:

Rebalancing cost = O(K/N) not O(K)
Supports horizontal scaling
Works with replication + quorum (R/W) systems
Helps avoid single point of failure
Used in load balancers

🔥 FAANG Question

Q: How much data moves when adding a node? A: Only ~1/N data, not entire dataset

🧠 Script

"Consistent hashing ensures minimal data movement and supports efficient horizontal scaling."

🚀 Final 30-Second Summary (Perfect Interview Answer)

"Consistent hashing is used to partition data across nodes using a ring structure. Unlike modulo hashing, it minimizes data movement when nodes are added or removed. Virtual nodes further improve load balancing and eliminate hotspots. Data is replicated across multiple nodes for fault tolerance, usually with eventual consistency. This approach is widely used in systems like Cassandra and DynamoDB for scalable and highly available architectures."

Table of Contents

1. Problem with Naive Hashing (mod N)

✅ Approach

❌ Problems

🔥 FAANG Question

🧠 Script

2. Consistent Hashing (Core Idea)

✅ Key Idea

✅ Benefits

🔥 FAANG Question

🧠 Script

3. How It Works (Tokens & Ranges)

✅ Key Points

🔥 FAANG Question

🧠 Script

4. Problem in Basic Consistent Hashing

❌ Issues

🔥 FAANG Question

🧠 Script

5. Virtual Nodes (Vnodes) — MOST IMPORTANT

✅ Key Idea

✅ Benefits

🔥 FAANG Question

🧠 Script

🔷 7. Replication in Consistent Hashing

✅ Key Idea

✅ Types

🔥 FAANG Question

🧠 Script

🔷 8. Eventual Consistency

✅ Key Idea

🔥 FAANG Question

🧠 Script

🔷 9. Real-World Use Cases

✅ Used in:

🔥 FAANG Question

🧠 Script

🔷 10. Interview Gold Points (Often Missed)

⭐ Add these to stand out:

🔥 FAANG Question

🧠 Script

🚀 Final 30-Second Summary (Perfect Interview Answer)