Designing a URL Shortening Service like TinyURL

UNDERSTAND, SCOPE & CONSTRAINTS

1. Clarify Ambiguity + Define Scope

What is the primary goal? (Create short aliases for long URLs)
What happens when a short link is clicked? (Redirect to original URL)
Are there specific constraints on the short link length?

Before jumping into the design, I want to clarify the scope. We are building a URL shortening service similar to TinyURL. The core goal is to convert long URLs into short, unique aliases and redirect users back to the original destination. I'll focus on the shortening and redirection logic, and exclude complex user account management for now to keep the scope manageable.

2. Functional Requirements

Let me list the core functional requirements - the key actions users will perform.

Generate a unique short alias for a given long URL.
Redirect users from the short link to the original long URL.
Support expiration times for links.
Provide basic analytics (click counts).
Support custom short aliases (optional).

3. Non-Functional Requirements

Now I'll define the non-functional requirements, as these will dictate our database and caching choices.

High Availability: The service must be always available; if it goes down, all redirections fail.
Low Latency: Redirection should happen in real-time.
Read-Heavy: Redirections will far outnumber URL creations $\rightarrow$ 100:1 Ratio
Unpredictability: Short links should not be easily guessable (security/spam prevention).'

The system must be highly available with minimal redirection latency. Given a projected 100:1 read-to-write ratio, the system is heavily read-oriented. Short links must be non-predictable to prevent malicious scraping

4. Constraints & Assumptions

I'll make a few assumptions regarding traffic and data retention to guide the capacity estimation.

Read/Write ratio is 100:1.
Average storage duration is 5 years.
Default expiration is 2 years.

PHASE 2: ESTIMATION & INTERFACE

5. Capacity Estimation

I'll perform some back-of-the-envelope calculations to determine the scale of our infrastructure.

Traffic:

New URLs: 500M / month $\approx$ 200 URLs/sec. (2.6 x 10^8 seconds in a month and 10^5 seconds in a day)
Redirections: 50B / month $\approx$ 20K requests/sec.

Storage:

Total Objects (5 years): $500\text{M} \times 12\text{ months} \times 5\text{ years} = 30\text{B}$ records.
Size per record: $\approx 500\text{ bytes}$ .
Total Storage: $30\text{B} \times 500\text{ bytes} \approx 15\text{ TB}$ .

Bandwidth:

Ingress (Writes): $200\text{ URLs/s} \times 500\text{ bytes} \approx 100\text{ KB/s}$ .
Egress (Reads): $20\text{K URLs/s} \times 500\text{ bytes} \approx 10\text{ MB/s}$ .

Memory (Cache):

Following the 80/20 rule, we cache 20% of daily traffic.
Daily requests: $20\text{K/s} \times 86,400\text{s} \approx 1.7\text{B}$ requests.
Cache Size: $0.2 \times 1.7\text{B} \times 500\text{ bytes} \approx 170\text{ GB}$ .

6. Interface Definition

Now I'll define the REST APIs. I'm using a POST method for creation to handle the payload and a simple GET for the redirection to ensure maximum speed.

Create Short URL

POST /shorten

Request: { original_url, custom_alias?, expiration_date?, user_id? }
Response: { shortened_url, creation_date, expiration_date }

Redirect API

GET /{shortened_url}

Response: HTTP 302 Redirect $\rightarrow$ original_url

Analytics API

GET /analytics/{shortened_url}

Response: `{ click_count, unique_clicks, location_data, device_data }

**URL Management API

Retrieves a list of all URLs shortened by a specific user, with metadata.

GET /user/urls

user_id (string, required): The ID of the user whose URLs are being requested.
page (integer, optional): The page number for paginated results.
page_size (integer, optional): The number of results per page.

Response: urls (list): A list of URLs shortened by the user, including metadata like creation date and expiration date.

PHASE 3: HIGH-LEVEL DESIGN

7. High Level Architecture

The system follows a layered architecture: Client → CDN → Load Balancer → API servers → Cache → Database.
A dedicated Key Generation Service (KGS) provides unique short keys to API servers, decoupling key creation from request handling and ensuring constant-time allocation.
Frequently accessed URLs are served from CDN (edge caching) or Redis (in-memory cache) to minimize latency and reduce database load.
The database acts as the source of truth, with cache used as a performance layer.
Expiration is handled via cache TTL, read-time validation, and background cleanup jobs.

8. Data Model & Access Patterns

Choice: NoSQL (DynamoDB, Cassandra).
Reasoning:
- Access pattern is simple: shortURL → originalURL (key-value lookup)
- No joins required
- Handles billions of records with horizontal scaling
- Optimized for high write throughput and low-latency reads

9. Database Selection

URL Table: ShortURL (PK), OriginalURL, UserID, CreationDate, ExpirationDate
User Table: UserID (PK), UserDetails
Design Notes:
- ShortURL as primary key enables fast lookups for redirects
- Denormalized schema avoids joins
- ExpirationDate supports TTL and validation logic

PHASE 4: DEEP DIVE & DETAILED DESIGN

10. Core Component Deep Dive

To generate the short key, I'll propose a Key Generation Service (KGS). This avoids the collision problems associated with hashing and allows us to provide keys in constant time.

Approach 1: Counter + Base62 (recommended in real-world)

Use distributed counter (e.g., Twitter Snowflake)
Convert numeric ID → Base62

Example:

123456789 → "aZ91xK"

✅ Pros:

No collisions
Ordered IDs (debugging easier)
No storage needed for unused keys

Approach 2: Random pre-generated keys

Generate random 6-char strings
Store in DB

❗ Trade-off:

Requires storage (~hundreds of GB for billions of keys)
Slight overhead vs counter-based

🚀 Why KGS is superior (core reasoning)

Guaranteed uniqueness
- No collision checks
- No retry logic
Constant-time allocation
- No hashing + DB lookup loop
Predictable latency
- Critical for high-scale systems
Better scalability
- Independent scaling of key generation

❓ What if KGS becomes bottleneck?

Mitigation:
- Batch key fetching (app servers rarely call KGS)
- Horizontal scaling of KGS
- Partition key space across multiple KGS nodes

❓ What if KGS crashes?

App servers continue using cached keys
If exhausted:
- Failover to replica KGS
- Or temporary write degradation

❓ What if app server crashes after fetching keys?

Some keys are lost (never used)

✔ Acceptable because:

Key space is huge (billions/trillions)
Avoids expensive coordination

❓ How do we ensure no duplicate keys across servers?

KGS uses:
- Atomic operations (DB transactions / Redis INCR)
- Or partitioned key ranges per node

❓ What if key space is exhausted?

Increase key length:
- 6 chars → ~56B combinations
- 7 chars → ~3.5T combinations

✔ Can be done without breaking existing URLs

❓ Why not generate keys directly in app server?

Leads to:
- Race conditions
- Duplicate keys
- Need for distributed locking

✔ KGS centralizes this responsibility safely

I prefer KGS because it guarantees uniqueness and provides constant-time key allocation without collision handling. By prefetching keys and distributing key ranges, we eliminate it from the critical path while maintaining scalability and fault tolerance.

11. Caching Strategy (DEEPER)

Use Redis as primary cache layer (in-memory, low latency).
Apply Cache-Aside pattern:
- Read → check cache → fallback to DB → update cache
Cache key: shortURL → originalURL
Use TTL aligned with expiration time to auto-evict expired links

Why this works:

Reduces DB load significantly (~80–90% hits)
Ensures low-latency redirects

12. Partitioning & Sharding

Use hash-based partitioning on shortURL
Distributes data evenly across DB nodes
Apply consistent hashing for scalability and minimal rebalancing

Why:

Avoids hotspot partitions
Enables horizontal scaling to billions of records

13. Async Processing & Messaging

Use Kafka / Queue system for:
- Analytics (click tracking)
- Logging
- Cleanup jobs
Redirect flow remains synchronous, analytics is asynchronous

Why:

Keeps user-facing latency low
Decouples heavy processing from critical path

14. Load Balancing & Service Scaling

Use Load Balancer (L4/L7) between:
- Client → API servers
- API → Cache/DB
API servers are stateless
Enable horizontal auto-scaling

Why:

Eliminates single point of failure
Handles traffic spikes efficiently

15. Fault Tolerance and Failure Handling

Cache failure → fallback to DB
DB failure → use replication / failover
KGS failure → continue with pre-fetched keys
Retries with backoff for transient failures

Why:

Ensures high availability
Graceful degradation instead of full system failure

16. Multi-Region & Geo Distribution

Deploy system across multiple regions
Use Geo-DNS / routing to direct users to nearest region
Replicate data asynchronously across regions

Why:

Reduces latency for global users
Improves availability and disaster recovery

17. Security & Privacy

Use HTTPS for secure communication
Authentication via JWT/OAuth (optional for users)
Validate URLs to prevent malicious links
Rate limiting to prevent abuse

Why:

Protects user data
Prevents system misuse (spam/abuse)

18. Monitoring & Observability

Track:
- Latency (p95/p99)
- QPS
- Error rates
Use:
- Prometheus + Grafana (metrics)
- ELK stack (logs)

Why:

Enables quick detection of issues
Helps in scaling and optimization decisions

Here’s your finalized Phase 5, kept concise, sharp, and interview-ready with just the essential depth:

🔷 PHASE 5: EVALUATION

19. Trade-offs Analysis

KGS vs Hashing
- KGS → no collisions, predictable latency, but adds system complexity
- Hashing → simpler, but requires collision handling and retries
Cache vs Freshness
- Caching improves latency and reduces DB load
- May serve slightly stale data (acceptable for this system)
NoSQL vs SQL
- NoSQL → better horizontal scalability and high throughput
- SQL → stronger consistency but harder to scale at large volume
Pre-fetching keys (KGS)
- Improves performance (O(1) allocation)
- May waste some keys if servers crash

20. Bottlenecks & Future Improvements

Hot URLs (viral links)
- Bottleneck at cache/DB
- Improve via CDN caching and cache replication
Cache limitations
- Increase cache size or use distributed cache cluster
Database scaling
- Add more shards / partitions as data grows
Analytics overhead
- Move fully to async pipelines (Kafka + data warehouse)
Future Improvements
- Add ranking / smart redirection (geo/device-based)
- Introduce ML-based link analytics
- Improve abuse detection and rate limiting

Table of Contents

UNDERSTAND, SCOPE & CONSTRAINTS

1. Clarify Ambiguity + Define Scope

2. Functional Requirements

3. Non-Functional Requirements

4. Constraints & Assumptions

PHASE 2: ESTIMATION & INTERFACE

5. Capacity Estimation

Traffic:

Storage:

Bandwidth:

Memory (Cache):

6. Interface Definition

Create Short URL

Redirect API

Analytics API

**URL Management API

PHASE 3: HIGH-LEVEL DESIGN

7. High Level Architecture

8. Data Model & Access Patterns

9. Database Selection

PHASE 4: DEEP DIVE & DETAILED DESIGN

10. Core Component Deep Dive

🚀 Why KGS is superior (core reasoning)

❓ What if KGS becomes bottleneck?

❓ What if KGS crashes?

❓ What if app server crashes after fetching keys?

❓ How do we ensure no duplicate keys across servers?

❓ What if key space is exhausted?

❓ Why not generate keys directly in app server?

11. Caching Strategy (DEEPER)

12. Partitioning & Sharding

13. Async Processing & Messaging

14. Load Balancing & Service Scaling

15. Fault Tolerance and Failure Handling

16. Multi-Region & Geo Distribution

17. Security & Privacy

18. Monitoring & Observability

🔷 PHASE 5: EVALUATION

19. Trade-offs Analysis

20. Bottlenecks & Future Improvements