Logo
Published on

Data Compression vs Deduplication - System Design Interview Guide

  • Compression ? Shrinks data inside a file
  • Deduplication ? Removes duplicate data across files

?? Script

�Compression reduces size within data. Deduplication removes repeated data across the system.�


?? 2. Data Compression (High Signal)

? What it does

  • Encodes data using fewer bits

  • Works within a single file/stream

  • Types:

    • Lossless (exact recovery)
    • Lossy (some data removed)

?? Key Insight

?? Optimizes storage + network bandwidth


??? Architecture Signals (Use Compression when)

  • Need faster network transfer (CDN, APIs)
  • Storing large files (images, videos, logs)
  • Bandwidth is expensive
  • Real-time systems need smaller payloads

? Problems

  • CPU overhead (compress/decompress)
  • Lossy ? quality degradation

?? FAANG Q&A

Q1: Why gzip APIs? ?? Reduce payload ? faster response time.

Q2: When avoid compression? ?? Already compressed data (JPEG, MP4).

Q3: Tradeoff? ?? CPU vs bandwidth.


?? Script

�I use compression to reduce storage and network cost. It works at the file level and is useful for transmission efficiency.�


?? 3. Data Deduplication (High Signal)

? What it does

  • Removes duplicate blocks/files
  • Stores one copy + references
  • Works across entire system

?? Key Insight

?? Optimizes storage at scale


??? Architecture Signals (Use Deduplication when)

  • Backup systems (daily snapshots)
  • Cloud storage (same files repeated)
  • Logs / documents with redundancy
  • Large-scale storage (TB�PB)

? Problems

  • Needs hashing/indexing ? CPU/memory cost
  • Only works for identical data

?? FAANG Q&A

Q1: Why dedup in backups? ?? Same files repeated ? huge storage savings.

Q2: File-level vs block-level dedup?

  • File ? simple, less efficient
  • Block ? complex, more savings

Q3: Tradeoff? ?? Storage saved vs compute overhead.


?? Script

�I use deduplication in large-scale storage systems to eliminate redundant data and save space across datasets.�


?? 4. Key Differences (Interview Table)

Aspect Compression Deduplication
Scope Within file Across system
Method Encode efficiently Remove duplicates
Use Case Network + storage Storage optimization
Data Type Any data Only identical data
Recovery Decompress Use references
CPU Cost Medium High (hashing/indexing)

?? Script

�Compression reduces redundancy within data, while deduplication removes redundancy across data.�


?? 5. Real Architecture Decision (FAANG Level)

? Use BOTH together (Very Important)

Upload ? Compression (reduce size)
Storage ? Deduplication (remove duplicates)

?? Why?

  • Compression ? saves bandwidth
  • Dedup ? saves storage

?? FAANG Q&A

Q: Which comes first? ?? Dedup BEFORE compression (important!)

Why? ?? Compression changes data ? duplicates become unrecognizable.


?? Script

�In real systems, I first deduplicate to remove duplicates, then compress to reduce size further.�


?? 6. Strong Signals vs Weak Signals

?? Choose Compression (Strong Signals)

  • �Reduce API payload�
  • �Improve latency�
  • �Streaming / CDN�
  • �Bandwidth optimization�

?? Choose Deduplication (Strong Signals)

  • �Backup system�
  • �Repeated files/data�
  • �Storage cost problem�
  • �Large-scale storage�

?? Weak Signals

  • �Big data ? use compression only� ?
  • �Storage issue ? always dedup� ?

?? 7. FAANG-Level Insights (Added)

?? 1. Chunking Strategy (Dedup)

  • Fixed-size vs variable-size chunks ?? Variable = better dedup ratio

?? 2. Content Addressable Storage

  • Hash ? data identity (used in Git, S3-like systems)

?? 3. Inline vs Post-process Dedup

  • Inline ? during write (slow, real-time)
  • Post-process ? later (faster ingestion)

?? 4. Compression Algorithms

  • gzip (general)
  • Snappy (fast, less compression)
  • LZ4 (real-time systems)

?? Final Ultra-Short Summary

?? Golden Line

�Compression shrinks data. Deduplication eliminates repetition.�


If you want next, I can combine SQL vs NoSQL + Replication + Compression + Gateway decisions into ONE master system design template (FAANG-ready).