- Published on
Back-of-the-Envelope Estimation - System Design Interview Complete Guide
Table of Contents
- π· Back-of-the-Envelope Estimation (BOE)
- π§ What it REALLY is
- π― Why it matters (REAL reason)
- βThis is what interviewers check:
- π₯ Key Insight
- β‘ What BOE helps you decide
- π§© Core Types (Must Know)
- 1. Load (Traffic)
- 2. Storage
- 3. Bandwidth
- 4. Latency
- 5. Compute (Servers/CPU)
- π§ Golden Technique (How to Think)
- Step 1: Break it down
- Step 2: Assume smartly
- Step 3: Convert to per second
- Step 4: Sanity check
- β‘ Powerful Shortcuts (Rules of Thumb)
- π Real Example (Interview Style)
- Design Instagram
- Load:
- Storage (with full 1MB posts):
- Storage (metadata only, 1KB/post):
- βοΈ Most Important Insight
- β Common Mistakes
- π€ Interview Script (MEMORIZE THIS)
- Start:
- Assumptions:
- Calculate:
- Expand:
- Insight:
- π§ One-Line Summary
π· Back-of-the-Envelope Estimation (BOE)
π§ What it REALLY is
π Quick math to understand scale
"Not exact β just directionally correct"
π¬ Say this in interview:
"I'm not trying to be precise here β I just need to understand the order of magnitude so my design choices are justified."
π― Why it matters (REAL reason)
βThis is what interviewers check:
- Can you think in scale?
- Can you justify your design?
π₯ Key Insight
"Without estimation β your design is just guessing"
π¬ Say this in interview:
"I want to do a quick estimation before jumping into design β this will help me justify decisions like whether we need caching, sharding, or a CDN."
β‘ What BOE helps you decide
- How many servers?
- How much storage?
- Can system handle traffic?
- Where bottlenecks are?
π¬ Say this in interview:
"Based on these numbers, I'll determine what kind of infrastructure we need and where the scaling pressure will be."
π§© Core Types (Must Know)
1. Load (Traffic)
π Requests per second (RPS)
Example:
- 10βΆ (1 Million) users Γ 10 actions/day = 10β· (10 Million) requests/day
- 10β· Γ· 10β΅ seconds (β 86,400 β 10β΅) = 10Β² = ~100 RPS β
Shortcut: 10β· (10M) requests Γ· 10β΅ (100K) seconds = 10Β² = 100 RPS
π¬ Say this in interview:
"With [X] million DAU and roughly [Y] actions per user per day, that's [XΓY] million requests per day β divided by 86,400 seconds, that's about [Z] RPS. At peak, I'd assume 2β3x that, so roughly [2Zβ3Z] RPS."
2. Storage
π Data growth
Example:
- 1 photo = 2 MB = 2 Γ 10βΆ bytes
- 1 Million uploads/day = 10βΆ photos/day
- 10βΆ (1M) Γ 2 Γ 10βΆ (2MB) bytes = 2 Γ 10ΒΉΒ² bytes/day
Now convert bytes β TB:
- 2 Γ 10ΒΉΒ² bytes Γ· 10Β³ = 2 Γ 10βΉ (2 Billion) KB
- 2 Γ 10βΉ KB Γ· 10Β³ = 2 Γ 10βΆ (2 Million) MB
- 2 Γ 10βΆ MB Γ· 10Β³ = 2 Γ 10Β³ (2 Thousand) GB
- 2 Γ 10Β³ GB Γ· 10Β³ = 2 TB/day β
Shortcut: 1M items Γ 2MB = 2 Γ 10ΒΉΒ² bytes = 2 TB
(10βΆ Γ 10βΆ = 10ΒΉΒ² = 1 TB β Γ 2 = 2 TB)
π¬ Say this in interview:
"Each [item] is roughly [size]. With [N] [items] per day, that's [N Γ size] per day, or about [Y] TB per year. We'll need scalable object storage β something like S3."
3. Bandwidth
π Data transfer per second
Example:
- 1 video stream = 5 MB/s = 5 Γ 10βΆ bytes/sec
- 1 Thousand concurrent users = 10Β³ users
- 10Β³ (1K) Γ 5 Γ 10βΆ (5MB) bytes/sec = 5 Γ 10βΉ bytes/sec
Convert bytes/sec β GB/sec:
- 5 Γ 10βΉ bytes Γ· 10Β³ = 5 Γ 10βΆ (5 Million) KB
- 5 Γ 10βΆ KB Γ· 10Β³ = 5 Γ 10Β³ (5 Thousand) MB
- 5 Γ 10Β³ MB Γ· 10Β³ = 5 GB/sec β
Shortcut: 1K users Γ 5MB = 5 Γ 10βΉ bytes = 5 GB
(10Β³ Γ 10βΆ = 10βΉ = 1 GB β Γ 5 = 5 GB)
π¬ Say this in interview:
"If [N] users are streaming simultaneously and each stream is [X] Mbps, total egress bandwidth is [N Γ X] Gbps. That tells me we definitely need a CDN to handle this at the edge."
4. Latency
π Time taken
Sequential example (calls happen one after another β add them up):
User Feed Request:
Auth Service β 20 ms
User Service β 30 ms
Post Service β 50 ms
Ranking Service β 40 ms
βββββββββββββββββββββββββ
Total = 140 ms β
(within 200ms p99)
Parallel example (calls happen at the same time β take the max):
User Feed Request:
Auth Service β 20 ms ββ
Post Service β 50 ms ββ€ (all fire at once)
Ranking Service β 40 ms ββ
βββββββββββββββββββββββββ
Total = max(20, 50, 40) = 50 ms β
(much faster!)
Rule: Sequential = sum all. Parallel = take the slowest one. Always ask: "Can these calls be parallelized?" β it can cut latency dramatically.
π¬ Say this in interview:
"This request involves 4 sequential service calls β auth (20ms), user lookup (30ms), post fetch (50ms), and ranking (40ms) β totalling 140ms, which is within our 200ms p99 target. If we're ever close to the limit, I'd parallelize the post fetch and ranking calls, bringing total latency down to ~50ms."
5. Compute (Servers/CPU)
π How many CPU cores and servers do we need?
Formula:
CPU cores needed = RPS Γ latency (in seconds)
Servers needed = CPU cores Γ· cores per server
Example (using the same Instagram numbers from above):
RPS = 10K = 10β΄ (10 Thousand) req/sec
Latency = 50 ms (parallel path from latency example) = 5 Γ 10β»Β² sec
CPU cores = 10β΄ Γ 5 Γ 10β»Β² = 500 cores
= 10β΄ Γ 10β»Β² = 10Β² = 100 β Γ 5 = 500 cores β
Each server = 16 cores (standard)
Servers = 500 Γ· 16 β 32 servers
Add 2Γ safety buffer for peak traffic and redundancy:
32 Γ 2 = ~64 servers β
Summary table:
| What | Value | Power of 10 |
|---|---|---|
| RPS | 10K | 10β΄ |
| Latency | 50 ms | 5 Γ 10β»Β² |
| CPU cores | ~500 | 5 Γ 10Β² |
| Cores/server | 16 | β |
| Base servers | ~32 | ~3.2 Γ 10ΒΉ |
| With 2Γ buffer | ~64 servers | ~6.4 Γ 10ΒΉ |
Rule of thumb: Always add a 2Γ safety multiplier for peak traffic and node failures.
π¬ Say this in interview:
"We established 10K RPS from our traffic estimation, and from our latency analysis each request takes about 50ms end-to-end. Using the formula β RPS Γ latency(sec) = CPU cores β that's 10,000 Γ 0.05 = 500 CPU cores. With 16-core servers, that's about 32 servers. Adding a 2x buffer for peak traffic and redundancy, I'd provision around 64 servers to start, with auto-scaling enabled."
π§ Golden Technique (How to Think)
Step 1: Break it down
"Users β actions β data"
π¬ Say this in interview:
"Let me break this down: how many users, how often they act, and how much data each action generates."
Step 2: Assume smartly
Use round numbers:
- 1K, 1M, 1B
- 1 KB, 1 MB, 1 GB
π¬ Say this in interview:
"I'll use round numbers β these are estimates, not exact figures. If you disagree with any assumption, let me know and I'll adjust."
Step 3: Convert to per second
π Always go to RPS
"per day Γ· 86400"
π¬ Say this in interview:
"I always convert to per-second numbers β that's what matters for infrastructure sizing. One day is roughly 86,000 seconds, so I'll use 100K for simplicity."
Step 4: Sanity check
Ask:
"Does this feel realistic?"
π¬ Say this in interview:
"Let me sanity check β [X] TB/day feels right for a system of this scale. Instagram reportedly stores petabytes, so we're in the right ballpark."
β‘ Powerful Shortcuts (Rules of Thumb)
| Unit | Value | Power |
|---|---|---|
| 1 day | ~86K s | ~10β΅ s |
| Thousand | 1K | 10Β³ |
| Million | 1M | 10βΆ |
| Billion | 1B | 10βΉ |
| Trillion | 1T | 10ΒΉΒ² |
| 1 KB | 10Β³ B | bytes |
| 1 MB | 10βΆ B | bytes |
| 1 GB | 10βΉ B | bytes |
| 1 TB | 10ΒΉΒ² B | bytes |
| 1 PB | 10ΒΉβ΅ B | bytes |
π¬ Say this in interview:
"I'll use the standard shortcut β 1 day β 100K seconds. It keeps the math clean and the interviewer can follow along easily."
π Real Example (Interview Style)
Design Instagram
Assume:
- 10βΈ (100 Million) users
- 10 posts/day
Load:
- 10βΈ (100M) Γ 10 = 10βΉ (1 Billion) posts/day
- 10βΉ Γ· 10β΅ (100K sec/day) = 10β΄ = 10K RPS β
Storage (with full 1MB posts):
- 1 post = 1 MB = 10βΆ bytes
- 10βΉ (1 Billion) posts Γ 10βΆ (1MB) bytes = 10ΒΉβ΅ bytes
- 10ΒΉβ΅ Γ· 10ΒΉΒ² = 1,000 TB = 1 PB/day β
Storage (metadata only, 1KB/post):
- 10βΉ (1 Billion) posts Γ 10Β³ (1KB) bytes = 10ΒΉΒ² bytes = 1 TB/day β
Rule: 1 Billion Γ 1KB = 1TB β 10βΉ Γ 10Β³ = 10ΒΉΒ²
π Now you KNOW:
- Need distributed storage
- Need CDN
- Need sharding
π¬ Say this in interview:
"100 million DAU, each posting 10 times a day β that's 1 billion write events per day, or about 10,000 writes per second. Each post with metadata and a compressed image is roughly 1 MB β so 1 billion Γ 1 MB = 1 petabyte per day for new writes. That clearly requires distributed object storage like S3, a CDN for reads, and database sharding for write throughput."
βοΈ Most Important Insight
"Estimation β drives architecture decisions"
Example:
- High RPS β load balancer + scaling
- Huge storage β S3 + sharding
- High bandwidth β CDN
π¬ Say this in interview:
"These estimates aren't just numbers β they're what tells me which architectural components I actually need. High RPS means I need horizontal scaling and a load balancer. Petabyte-scale storage means I can't use a single relational DB β I need object storage and sharding. High bandwidth means I need a CDN to avoid hammering the origin servers."
β Common Mistakes
- Trying to be exact β
- No assumptions β
- Not converting to per second β
- Ignoring peak traffic β
π¬ Say this in interview:
"I'll make sure to state my assumptions explicitly, work in round numbers, convert everything to per-second figures, and account for peak traffic β usually 2β3x the average."
π€ Interview Script (MEMORIZE THIS)
Start:
"Before designing, let me do a quick back-of-the-envelope estimation to understand system scale and justify my architectural choices."
Assumptions:
"I'll assume 100 million DAU, with each user performing roughly 10 actions per day. That's 1 billion requests per day β about 10,000 to 12,000 RPS on average, and around 30,000 at peak."
Calculate:
"For storage: if each action generates 1 KB of metadata, that's 1 TB of metadata per day. If we also store media at roughly 1 MB each, and 10% of actions include media, that's 100 TB/day of media β about 36 PB per year."
Expand:
"For bandwidth: at 30K RPS with a 1 KB average response, that's 30 MB/sec of read bandwidth. For media serving, if 1% of users stream 1 MB/sec simultaneously, that's 1 TB/sec β we absolutely need a CDN."
Insight:
"Based on these numbers, the system needs: horizontal scaling behind a load balancer, a distributed database with sharding for write throughput, Redis or Memcached for hot reads, object storage like S3 for media, and a CDN for global low-latency delivery."
π§ One-Line Summary
"Back-of-the-envelope estimation is used to quickly approximate system scale and guide architecture decisions."
π¬ Use this to open or close the estimation section in any interview:
"The goal of estimation isn't precision β it's to make sure my design is built for the right scale, not over-engineered or under-powered."