Back to roadmap
Module 7 · NoSQL, Search, Graph, ObjectDay 06725 min

Data Lake vs Warehouse vs Lakehouse

Cheap raw bytes vs fast structured queries — and the middle.

Day 067

Data Lake vs Warehouse vs Lakehouse

Lake
datastore
Lakehouse
datastore
Warehouse
datastore
Memory hook

Data Lake vs Warehouse vs Lakehouse: cheap raw bytes vs fast structured queries

Mental model

match the datastore to the access pattern

Design lens

Warehouse SQL is faster; pay per query/credit.

Recall anchors
LakeLakehouseWarehouse

Why it matters

Data lakes hold raw data in object storage cheaply; warehouses hold structured, columnar data for fast SQL. Lakehouse architectures (Iceberg, Delta) store warehouse-like tables on lake storage.

Deep dive

Warehouse: Snowflake, BigQuery, Redshift — fast SQL, expensive compute.

Lake: S3 + Parquet — cheap storage, BYO compute (Spark, Trino).

Lakehouse: ACID tables on lake storage; bridges the gap.

Demo / scenario

Build a daily revenue report.

  1. Stream events to lake (Parquet).
  2. Iceberg tables organize partitions.
  3. Trino queries lakehouse for ad-hoc.
  4. Materialized aggregates feed dashboards.

Tradeoffs

  • Warehouse SQL is faster; pay per query/credit.
  • Lake gives raw access for ML at low cost.
  • Lakehouse adds catalog/metadata complexity.

Diagram

Lake
Lakehouse
Warehouse
Three modes of analytical storage.

Mind map

Check yourself

Loading quiz…

Sources & further reading