- Published on
- •⏳ 9 min read
Designing a URL Shortening Service like TinyURL
- Authors

- Name
- Muhammad Yasir
Table of Contents
- UNDERSTAND, SCOPE & CONSTRAINTS
- 1. Clarify Ambiguity + Define Scope
- 2. Functional Requirements
- 3. Non-Functional Requirements
- 4. Constraints & Assumptions
- PHASE 2: ESTIMATION & INTERFACE
- 5. Capacity Estimation
- 6. Interface Definition
- **URL Management API
- PHASE 3: HIGH-LEVEL DESIGN
- 7. High Level Architecture
- 8. Data Model & Access Patterns
- 9. Database Selection
- PHASE 4: DEEP DIVE & DETAILED DESIGN
- 10. Core Component Deep Dive
- 11. Caching Strategy (DEEPER)
- 12. Partitioning & Sharding
- 13. Async Processing & Messaging
- 14. Load Balancing & Service Scaling
- 15. Fault Tolerance and Failure Handling
- 16. Multi-Region & Geo Distribution
- 17. Security & Privacy
- 18. Monitoring & Observability
- 🔷 PHASE 5: EVALUATION
- 19. Trade-offs Analysis
- 20. Bottlenecks & Future Improvements
UNDERSTAND, SCOPE & CONSTRAINTS
1. Clarify Ambiguity + Define Scope
- What is the primary goal? (Create short aliases for long URLs)
- What happens when a short link is clicked? (Redirect to original URL)
- Are there specific constraints on the short link length?
Before jumping into the design, I want to clarify the scope. We are building a URL shortening service similar to TinyURL. The core goal is to convert long URLs into short, unique aliases and redirect users back to the original destination. I'll focus on the shortening and redirection logic, and exclude complex user account management for now to keep the scope manageable.
2. Functional Requirements
Let me list the core functional requirements - the key actions users will perform.
- Generate a unique short alias for a given long URL.
- Redirect users from the short link to the original long URL.
- Support expiration times for links.
- Provide basic analytics (click counts).
- Support custom short aliases (optional).
3. Non-Functional Requirements
Now I'll define the non-functional requirements, as these will dictate our database and caching choices.
- High Availability: The service must be always available; if it goes down, all redirections fail.
- Low Latency: Redirection should happen in real-time.
- Read-Heavy: Redirections will far outnumber URL creations 100:1 Ratio
- Unpredictability: Short links should not be easily guessable (security/spam prevention).'
The system must be highly available with minimal redirection latency. Given a projected 100:1 read-to-write ratio, the system is heavily read-oriented. Short links must be non-predictable to prevent malicious scraping
4. Constraints & Assumptions
I'll make a few assumptions regarding traffic and data retention to guide the capacity estimation.
- Read/Write ratio is 100:1.
- Average storage duration is 5 years.
- Default expiration is 2 years.
PHASE 2: ESTIMATION & INTERFACE
5. Capacity Estimation
I'll perform some back-of-the-envelope calculations to determine the scale of our infrastructure.
Traffic:
- New URLs: 500M / month 200 URLs/sec. (2.6 x 10^8 seconds in a month and 10^5 seconds in a day)
- Redirections: 50B / month 20K requests/sec.
Storage:
- Total Objects (5 years): records.
- Size per record: .
- Total Storage: .
Bandwidth:
- Ingress (Writes): .
- Egress (Reads): .
Memory (Cache):
- Following the 80/20 rule, we cache 20% of daily traffic.
- Daily requests: requests.
- Cache Size: .
6. Interface Definition
Now I'll define the REST APIs. I'm using a POST method for creation to handle the payload and a simple GET for the redirection to ensure maximum speed.
Create Short URL
POST /shorten
- Request:
{ original_url, custom_alias?, expiration_date?, user_id? } - Response:
{ shortened_url, creation_date, expiration_date }
Redirect API
GET /{shortened_url}
- Response:
HTTP 302 Redirectoriginal_url
Analytics API
GET /analytics/{shortened_url}
- Response: `{ click_count, unique_clicks, location_data, device_data }
**URL Management API
Retrieves a list of all URLs shortened by a specific user, with metadata.
GET /user/urls
- user_id (string, required): The ID of the user whose URLs are being requested.
- page (integer, optional): The page number for paginated results.
- page_size (integer, optional): The number of results per page.
Response: urls (list): A list of URLs shortened by the user, including metadata like creation date and expiration date.
PHASE 3: HIGH-LEVEL DESIGN
7. High Level Architecture
- The system follows a layered architecture: Client → CDN → Load Balancer → API servers → Cache → Database.
- A dedicated Key Generation Service (KGS) provides unique short keys to API servers, decoupling key creation from request handling and ensuring constant-time allocation.
- Frequently accessed URLs are served from CDN (edge caching) or Redis (in-memory cache) to minimize latency and reduce database load.
- The database acts as the source of truth, with cache used as a performance layer.
- Expiration is handled via cache TTL, read-time validation, and background cleanup jobs.
8. Data Model & Access Patterns
-
Choice: NoSQL (DynamoDB, Cassandra).
-
Reasoning:
- Access pattern is simple:
shortURL → originalURL(key-value lookup) - No joins required
- Handles billions of records with horizontal scaling
- Optimized for high write throughput and low-latency reads
- Access pattern is simple:
9. Database Selection
-
URL Table:
ShortURL (PK),OriginalURL,UserID,CreationDate,ExpirationDate -
User Table:
UserID (PK),UserDetails -
Design Notes:
ShortURLas primary key enables fast lookups for redirects- Denormalized schema avoids joins
ExpirationDatesupports TTL and validation logic
PHASE 4: DEEP DIVE & DETAILED DESIGN
10. Core Component Deep Dive
To generate the short key, I'll propose a Key Generation Service (KGS). This avoids the collision problems associated with hashing and allows us to provide keys in constant time.
Approach 1: Counter + Base62 (recommended in real-world)
- Use distributed counter (e.g., Twitter Snowflake)
- Convert numeric ID → Base62
Example:
123456789 → "aZ91xK"
✅ Pros:
- No collisions
- Ordered IDs (debugging easier)
- No storage needed for unused keys
Approach 2: Random pre-generated keys
- Generate random 6-char strings
- Store in DB
❗ Trade-off:
- Requires storage (~hundreds of GB for billions of keys)
- Slight overhead vs counter-based
🚀 Why KGS is superior (core reasoning)
-
Guaranteed uniqueness
- No collision checks
- No retry logic
-
Constant-time allocation
- No hashing + DB lookup loop
-
Predictable latency
- Critical for high-scale systems
-
Better scalability
- Independent scaling of key generation
❓ What if KGS becomes bottleneck?
-
Mitigation:
- Batch key fetching (app servers rarely call KGS)
- Horizontal scaling of KGS
- Partition key space across multiple KGS nodes
❓ What if KGS crashes?
-
App servers continue using cached keys
-
If exhausted:
- Failover to replica KGS
- Or temporary write degradation
❓ What if app server crashes after fetching keys?
- Some keys are lost (never used)
✔ Acceptable because:
- Key space is huge (billions/trillions)
- Avoids expensive coordination
❓ How do we ensure no duplicate keys across servers?
-
KGS uses:
- Atomic operations (DB transactions / Redis INCR)
- Or partitioned key ranges per node
❓ What if key space is exhausted?
-
Increase key length:
- 6 chars → ~56B combinations
- 7 chars → ~3.5T combinations
✔ Can be done without breaking existing URLs
❓ Why not generate keys directly in app server?
-
Leads to:
- Race conditions
- Duplicate keys
- Need for distributed locking
✔ KGS centralizes this responsibility safely
I prefer KGS because it guarantees uniqueness and provides constant-time key allocation without collision handling. By prefetching keys and distributing key ranges, we eliminate it from the critical path while maintaining scalability and fault tolerance.
11. Caching Strategy (DEEPER)
-
Use Redis as primary cache layer (in-memory, low latency).
-
Apply Cache-Aside pattern:
- Read → check cache → fallback to DB → update cache
-
Cache key:
shortURL → originalURL -
Use TTL aligned with expiration time to auto-evict expired links
Why this works:
- Reduces DB load significantly (~80–90% hits)
- Ensures low-latency redirects
12. Partitioning & Sharding
- Use hash-based partitioning on
shortURL - Distributes data evenly across DB nodes
- Apply consistent hashing for scalability and minimal rebalancing
Why:
- Avoids hotspot partitions
- Enables horizontal scaling to billions of records
13. Async Processing & Messaging
-
Use Kafka / Queue system for:
- Analytics (click tracking)
- Logging
- Cleanup jobs
-
Redirect flow remains synchronous, analytics is asynchronous
Why:
- Keeps user-facing latency low
- Decouples heavy processing from critical path
14. Load Balancing & Service Scaling
-
Use Load Balancer (L4/L7) between:
- Client → API servers
- API → Cache/DB
-
API servers are stateless
-
Enable horizontal auto-scaling
Why:
- Eliminates single point of failure
- Handles traffic spikes efficiently
15. Fault Tolerance and Failure Handling
- Cache failure → fallback to DB
- DB failure → use replication / failover
- KGS failure → continue with pre-fetched keys
- Retries with backoff for transient failures
Why:
- Ensures high availability
- Graceful degradation instead of full system failure
16. Multi-Region & Geo Distribution
- Deploy system across multiple regions
- Use Geo-DNS / routing to direct users to nearest region
- Replicate data asynchronously across regions
Why:
- Reduces latency for global users
- Improves availability and disaster recovery
17. Security & Privacy
- Use HTTPS for secure communication
- Authentication via JWT/OAuth (optional for users)
- Validate URLs to prevent malicious links
- Rate limiting to prevent abuse
Why:
- Protects user data
- Prevents system misuse (spam/abuse)
18. Monitoring & Observability
-
Track:
- Latency (p95/p99)
- QPS
- Error rates
-
Use:
- Prometheus + Grafana (metrics)
- ELK stack (logs)
Why:
- Enables quick detection of issues
- Helps in scaling and optimization decisions
Here’s your finalized Phase 5, kept concise, sharp, and interview-ready with just the essential depth:
🔷 PHASE 5: EVALUATION
19. Trade-offs Analysis
-
KGS vs Hashing
- KGS → no collisions, predictable latency, but adds system complexity
- Hashing → simpler, but requires collision handling and retries
-
Cache vs Freshness
- Caching improves latency and reduces DB load
- May serve slightly stale data (acceptable for this system)
-
NoSQL vs SQL
- NoSQL → better horizontal scalability and high throughput
- SQL → stronger consistency but harder to scale at large volume
-
Pre-fetching keys (KGS)
- Improves performance (O(1) allocation)
- May waste some keys if servers crash
20. Bottlenecks & Future Improvements
-
Hot URLs (viral links)
- Bottleneck at cache/DB
- Improve via CDN caching and cache replication
-
Cache limitations
- Increase cache size or use distributed cache cluster
-
Database scaling
- Add more shards / partitions as data grows
-
Analytics overhead
- Move fully to async pipelines (Kafka + data warehouse)
-
Future Improvements
- Add ranking / smart redirection (geo/device-based)
- Introduce ML-based link analytics
- Improve abuse detection and rate limiting