Logo
Published on

Database Indexing - FAANG System Design Interview Guide

🔷 Database Indexing (FAANG Level)

🧠 1. Core Idea (Strong Opening)

"An index is a data structure that allows fast lookup of rows without scanning the entire table."


⚡ 2. Why Indexing (REAL Insight)

Without index:

  • Full table scan ❌ (O(n))

With index:

  • Direct lookup ✅ (O(log n))

Analogy

📚 Book:

  • No index → read whole book
  • With index → jump to page

🎤 FAANG Question

Q: Why do we need indexing? A:

"To reduce query latency by avoiding full table scans and enabling efficient lookups."


⚙️ 3. How it Works (Simple but Important)

👉 Index stores:

  • Column value
  • Pointer to actual row

👉 DB uses index → finds location → fetches row


🔥 Key Insight

"Index = extra data structure optimized for reads"


🧩 4. Types of Indexes (Must Know)


🟢 1. Primary Index

👉 On primary key

  • Unique
  • One per table

🔵 2. Secondary Index

👉 On non-primary columns

Example:

  • search by email, name

🎤 FAANG Question

Q: When do you add secondary index? A:

"When a column is frequently used in WHERE, JOIN, or ORDER BY clauses."


🟣 3. Composite Index (IMPORTANT 🔥)

👉 Multiple columns

Example:

(index: userId, createdAt)

Rule:

"Order matters"


🎤 FAANG Question

Q: Why column order matters? A:

"Because indexes are sorted, so prefix columns are used efficiently."


⚡ 4. Covering Index (ADVANCED 🔥)

👉 Index contains all needed columns

Benefit:

  • No need to hit DB table

⚙️ 5. Underlying Data Structures (FAANG MUST)

🌳 B-Tree (Most Common)

👉 Balanced tree

  • Search: O(log n)
  • Supports range queries

⚡ Hash Index

👉 Key → value mapping

  • Fast lookup
  • ❌ No range queries

🎤 FAANG Question

Q: Why B-Tree over hash? A:

"Because B-Trees support both exact match and range queries."


⚠️ 6. Trade-offs (VERY IMPORTANT)

Reads vs Writes

Operation Impact
Read Faster ✅
Write Slower ❌

Why writes slow?

👉 Need to:

  • Update data
  • Update index

🎤 FAANG Question

Q: Why not index everything? A:

"Because indexes increase write latency and storage overhead."


🚨 7. When NOT to Use Index (Often Asked)

  • Small tables (full scan is fine)
  • High write systems (logs, metrics)
  • Low cardinality columns (e.g., gender)

🎤 FAANG Question

Q: Why avoid indexing low-cardinality columns? A:

"Because they don't filter much data, so index is not useful."


🔥 8. Missing but CRITICAL Concepts


🧠 A. Cardinality (VERY IMPORTANT)

👉 Uniqueness of data

  • High cardinality → good index
  • Low cardinality → bad index

🧠 B. Index Selectivity

👉 How much data it filters

"Higher selectivity = better index"


🧠 C. Index Maintenance Cost

  • Storage overhead
  • Rebalancing (B-Tree updates)

🧠 D. Clustered vs Non-Clustered (IMPORTANT)

Clustered

  • Data stored in index order

Non-clustered

  • Separate structure

🎤 FAANG Question

Q: Difference between clustered & non-clustered? A:

"Clustered defines physical order of data, non-clustered stores pointers."


⚡ 9. Real System Thinking

Example: Instagram

  • Index on userId → fetch posts
  • Index on createdAt → sort feed

👉 Without index:

  • Feed = slow ❌

⚖️ 10. Trade-off Summary

Benefit Cost
Fast reads Slow writes
Efficient queries Extra storage
Scalable queries Maintenance overhead

🎤 11. FAANG Interview Script

Start

"To optimize query performance, I'll add indexes on frequently queried columns."


Explain

"Indexes allow us to avoid full table scans and perform O(log n) lookups."


Smart Choice

"I'll use composite indexes based on query patterns."


Trade-off

"However, indexes increase write latency and storage overhead."


Close Strong

"So I'll only index high-cardinality, frequently queried fields."


🧠 Final One-Line (Must Memorize)

"Indexes improve read performance by trading off write latency and storage."


💡 FAANG-Level Insight (DIFFERENTIATOR)

"Good engineers add indexes. Great engineers design indexes based on query patterns and access frequency."