🔷 Back-of-the-Envelope Estimation (BOE)

🧠 What it REALLY is

👉 Quick math to understand scale

"Not exact → just directionally correct"

💬 Say this in interview:

"I'm not trying to be precise here — I just need to understand the order of magnitude so my design choices are justified."

🎯 Why it matters (REAL reason)

❗This is what interviewers check:

Can you think in scale?
Can you justify your design?

🔥 Key Insight

"Without estimation → your design is just guessing"

💬 Say this in interview:

"I want to do a quick estimation before jumping into design — this will help me justify decisions like whether we need caching, sharding, or a CDN."

⚡ What BOE helps you decide

How many servers?
How much storage?
Can system handle traffic?
Where bottlenecks are?

💬 Say this in interview:

"Based on these numbers, I'll determine what kind of infrastructure we need and where the scaling pressure will be."

🧩 Core Types (Must Know)

1. Load (Traffic)

👉 Requests per second (RPS)

Example:

10⁶ (1 Million) users × 10 actions/day = 10⁷ (10 Million) requests/day
10⁷ ÷ 10⁵ seconds (≈ 86,400 ≈ 10⁵) = 10² = ~100 RPS ✅

Shortcut: 10⁷ (10M) requests ÷ 10⁵ (100K) seconds = 10² = 100 RPS

💬 Say this in interview:

"With [X] million DAU and roughly [Y] actions per user per day, that's [X×Y] million requests per day — divided by 86,400 seconds, that's about [Z] RPS. At peak, I'd assume 2–3x that, so roughly [2Z–3Z] RPS."

2. Storage

👉 Data growth

Example:

1 photo = 2 MB = 2 × 10⁶ bytes
1 Million uploads/day = 10⁶ photos/day
10⁶ (1M) × 2 × 10⁶ (2MB) bytes = 2 × 10¹² bytes/day

Now convert bytes → TB:

2 × 10¹² bytes ÷ 10³ = 2 × 10⁹ (2 Billion) KB
2 × 10⁹ KB ÷ 10³ = 2 × 10⁶ (2 Million) MB
2 × 10⁶ MB ÷ 10³ = 2 × 10³ (2 Thousand) GB
2 × 10³ GB ÷ 10³ = 2 TB/day ✅

Shortcut: 1M items × 2MB = 2 × 10¹² bytes = 2 TB (10⁶ × 10⁶ = 10¹² = 1 TB → × 2 = 2 TB)

💬 Say this in interview:

"Each [item] is roughly [size]. With [N] [items] per day, that's [N × size] per day, or about [Y] TB per year. We'll need scalable object storage — something like S3."

3. Bandwidth

👉 Data transfer per second

Example:

1 video stream = 5 MB/s = 5 × 10⁶ bytes/sec
1 Thousand concurrent users = 10³ users
10³ (1K) × 5 × 10⁶ (5MB) bytes/sec = 5 × 10⁹ bytes/sec

Convert bytes/sec → GB/sec:

5 × 10⁹ bytes ÷ 10³ = 5 × 10⁶ (5 Million) KB
5 × 10⁶ KB ÷ 10³ = 5 × 10³ (5 Thousand) MB
5 × 10³ MB ÷ 10³ = 5 GB/sec ✅

Shortcut: 1K users × 5MB = 5 × 10⁹ bytes = 5 GB (10³ × 10⁶ = 10⁹ = 1 GB → × 5 = 5 GB)

💬 Say this in interview:

"If [N] users are streaming simultaneously and each stream is [X] Mbps, total egress bandwidth is [N × X] Gbps. That tells me we definitely need a CDN to handle this at the edge."

4. Latency

👉 Time taken

Sequential example (calls happen one after another → add them up):

User Feed Request:
  Auth Service      → 20 ms
  User Service      → 30 ms
  Post Service      → 50 ms
  Ranking Service   → 40 ms
  ─────────────────────────
  Total             = 140 ms  ✅ (within 200ms p99)

Parallel example (calls happen at the same time → take the max):

User Feed Request:
  Auth Service      → 20 ms ─┐
  Post Service      → 50 ms ─┤ (all fire at once)
  Ranking Service   → 40 ms ─┘
  ─────────────────────────
  Total             = max(20, 50, 40) = 50 ms  ✅ (much faster!)

Rule: Sequential = sum all. Parallel = take the slowest one. Always ask: "Can these calls be parallelized?" — it can cut latency dramatically.

💬 Say this in interview:

"This request involves 4 sequential service calls — auth (20ms), user lookup (30ms), post fetch (50ms), and ranking (40ms) — totalling 140ms, which is within our 200ms p99 target. If we're ever close to the limit, I'd parallelize the post fetch and ranking calls, bringing total latency down to ~50ms."

5. Compute (Servers/CPU)

👉 How many CPU cores and servers do we need?

Formula:

CPU cores needed = RPS × latency (in seconds)
Servers needed   = CPU cores ÷ cores per server

Example (using the same Instagram numbers from above):

RPS          = 10K = 10⁴ (10 Thousand) req/sec
Latency      = 50 ms (parallel path from latency example) = 5 × 10⁻² sec

CPU cores    = 10⁴ × 5 × 10⁻² = 500 cores
             = 10⁴ × 10⁻² = 10² = 100 → × 5 = 500 cores  ✅

Each server  = 16 cores (standard)
Servers      = 500 ÷ 16 ≈ 32 servers

Add 2× safety buffer for peak traffic and redundancy:

32 × 2 = ~64 servers  ✅

Summary table:

What	Value	Power of 10
RPS	10K	10⁴
Latency	50 ms	5 × 10⁻²
CPU cores	~500	5 × 10²
Cores/server	16	—
Base servers	~32	~3.2 × 10¹
With 2× buffer	~64 servers	~6.4 × 10¹

Rule of thumb: Always add a 2× safety multiplier for peak traffic and node failures.

💬 Say this in interview:

"We established 10K RPS from our traffic estimation, and from our latency analysis each request takes about 50ms end-to-end. Using the formula — RPS × latency(sec) = CPU cores — that's 10,000 × 0.05 = 500 CPU cores. With 16-core servers, that's about 32 servers. Adding a 2x buffer for peak traffic and redundancy, I'd provision around 64 servers to start, with auto-scaling enabled."

🧠 Golden Technique (How to Think)

Step 1: Break it down

"Users → actions → data"

💬 Say this in interview:

"Let me break this down: how many users, how often they act, and how much data each action generates."

Step 2: Assume smartly

Use round numbers:

1K, 1M, 1B
1 KB, 1 MB, 1 GB

💬 Say this in interview:

"I'll use round numbers — these are estimates, not exact figures. If you disagree with any assumption, let me know and I'll adjust."

Step 3: Convert to per second

👉 Always go to RPS

"per day ÷ 86400"

💬 Say this in interview:

"I always convert to per-second numbers — that's what matters for infrastructure sizing. One day is roughly 86,000 seconds, so I'll use 100K for simplicity."

Step 4: Sanity check

Ask:

"Does this feel realistic?"

💬 Say this in interview:

"Let me sanity check — [X] TB/day feels right for a system of this scale. Instagram reportedly stores petabytes, so we're in the right ballpark."

⚡ Powerful Shortcuts (Rules of Thumb)

Unit	Value	Power
1 day	~86K s	~10⁵ s
Thousand	1K	10³
Million	1M	10⁶
Billion	1B	10⁹
Trillion	1T	10¹²
1 KB	10³ B	bytes
1 MB	10⁶ B	bytes
1 GB	10⁹ B	bytes
1 TB	10¹² B	bytes
1 PB	10¹⁵ B	bytes

💬 Say this in interview:

"I'll use the standard shortcut — 1 day ≈ 100K seconds. It keeps the math clean and the interviewer can follow along easily."

🚀 Real Example (Interview Style)

Design Instagram

Assume:

10⁸ (100 Million) users
10 posts/day

Load:

10⁸ (100M) × 10 = 10⁹ (1 Billion) posts/day
10⁹ ÷ 10⁵ (100K sec/day) = 10⁴ = 10K RPS ✅

Storage (with full 1MB posts):

1 post = 1 MB = 10⁶ bytes
10⁹ (1 Billion) posts × 10⁶ (1MB) bytes = 10¹⁵ bytes
10¹⁵ ÷ 10¹² = 1,000 TB = 1 PB/day ✅

Storage (metadata only, 1KB/post):

10⁹ (1 Billion) posts × 10³ (1KB) bytes = 10¹² bytes = 1 TB/day ✅

Rule: 1 Billion × 1KB = 1TB → 10⁹ × 10³ = 10¹²

👉 Now you KNOW:

Need distributed storage
Need CDN
Need sharding

💬 Say this in interview:

"100 million DAU, each posting 10 times a day — that's 1 billion write events per day, or about 10,000 writes per second. Each post with metadata and a compressed image is roughly 1 MB — so 1 billion × 1 MB = 1 petabyte per day for new writes. That clearly requires distributed object storage like S3, a CDN for reads, and database sharding for write throughput."

⚖️ Most Important Insight

"Estimation → drives architecture decisions"

Example:

High RPS → load balancer + scaling
Huge storage → S3 + sharding
High bandwidth → CDN

💬 Say this in interview:

"These estimates aren't just numbers — they're what tells me which architectural components I actually need. High RPS means I need horizontal scaling and a load balancer. Petabyte-scale storage means I can't use a single relational DB — I need object storage and sharding. High bandwidth means I need a CDN to avoid hammering the origin servers."

❌ Common Mistakes

Trying to be exact ❌
No assumptions ❌
Not converting to per second ❌
Ignoring peak traffic ❌

💬 Say this in interview:

"I'll make sure to state my assumptions explicitly, work in round numbers, convert everything to per-second figures, and account for peak traffic — usually 2–3x the average."

🎤 Interview Script (MEMORIZE THIS)

Start:

"Before designing, let me do a quick back-of-the-envelope estimation to understand system scale and justify my architectural choices."

Assumptions:

"I'll assume 100 million DAU, with each user performing roughly 10 actions per day. That's 1 billion requests per day — about 10,000 to 12,000 RPS on average, and around 30,000 at peak."

Calculate:

"For storage: if each action generates 1 KB of metadata, that's 1 TB of metadata per day. If we also store media at roughly 1 MB each, and 10% of actions include media, that's 100 TB/day of media — about 36 PB per year."

Expand:

"For bandwidth: at 30K RPS with a 1 KB average response, that's 30 MB/sec of read bandwidth. For media serving, if 1% of users stream 1 MB/sec simultaneously, that's 1 TB/sec — we absolutely need a CDN."

Insight:

"Based on these numbers, the system needs: horizontal scaling behind a load balancer, a distributed database with sharding for write throughput, Redis or Memcached for hot reads, object storage like S3 for media, and a CDN for global low-latency delivery."

🧠 One-Line Summary

"Back-of-the-envelope estimation is used to quickly approximate system scale and guide architecture decisions."

💬 Use this to open or close the estimation section in any interview:

"The goal of estimation isn't precision — it's to make sure my design is built for the right scale, not over-engineered or under-powered."

Back-of-the-Envelope Estimation - System Design Interview Complete Guide

Table of Contents

🔷 Back-of-the-Envelope Estimation (BOE)

🧠 What it REALLY is

🎯 Why it matters (REAL reason)

❗This is what interviewers check:

🔥 Key Insight

⚡ What BOE helps you decide

🧩 Core Types (Must Know)

1. Load (Traffic)

2. Storage

3. Bandwidth

4. Latency

5. Compute (Servers/CPU)

🧠 Golden Technique (How to Think)

Step 1: Break it down

Step 2: Assume smartly

Step 3: Convert to per second

Step 4: Sanity check

⚡ Powerful Shortcuts (Rules of Thumb)

🚀 Real Example (Interview Style)

Design Instagram

Load:

Storage (with full 1MB posts):

Storage (metadata only, 1KB/post):

⚖️ Most Important Insight

❌ Common Mistakes

🎤 Interview Script (MEMORIZE THIS)

Start:

Assumptions:

Calculate:

Expand:

Insight:

🧠 One-Line Summary