How Databases Store Data
A database feels magical — ask a question, get an answer instantly from millions of records. Understanding how data is actually stored and retrieved on disk removes the magic and gives you the tools to build fast, reliable systems.
From Disk to Query
When you ask a database for a record, the result appears in milliseconds — but behind the scenes, data passes through several layers of storage and organisation. Understanding these layers explains why database performance choices matter.
At the lowest level, all data is bits on a storage device. A database system's job is to organise those bits into structures that make finding, updating, and protecting data as fast and reliable as possible.
The Storage Hierarchy
CPU Registers ~1 nanosecond — tiny, extremely fast
CPU Cache (L1/L2) ~5 nanoseconds — holds recently used data
RAM / Main Memory ~100 nanoseconds — database buffer pool lives here
─────────────────────────────────────────────────
SSD (Solid State) ~50 microseconds — primary storage today
HDD (Hard Disk) ~10 milliseconds — older systems, backups
Network Storage ~1-100ms+ — cloud databases, distributed
Databases keep frequently accessed data in RAM (the buffer pool) and write permanently to disk. The art of database tuning is minimising expensive disk reads by managing the buffer pool intelligently.
Pages and Blocks
Databases don't read or write individual rows one at a time. They read and write data in fixed-size chunks called pages (or blocks). A typical page is 8KB or 16KB. This is the fundamental unit of database I/O.
When you query a single row, the database reads the entire page containing that row. If your row is 100 bytes and the page is 8,192 bytes, you're loading 82 rows worth of data even if you only asked for one. This is why choosing the right indexes is critical — they let the database find the right page without scanning all pages.
Indexes — The Key to Fast Lookups
Without an index, finding a record means scanning every page in the table — a full table scan. For a table with 10 million rows, that could mean reading gigabytes of data. An index is a separate data structure that maps column values to the pages where matching rows live.
The most common index structure is the B-Tree (Balanced Tree). It keeps values sorted and allows any lookup in O(log n) time — meaning finding one record in 1 billion takes only about 30 comparisons.
Without index: scan all 1,000,000 rows
→ 1,000,000 comparisons
With B-Tree index: tree height ≈ log₂(n)
→ log₂(1,000,000) ≈ 20 comparisons
B-Tree on student_id:
[500]
/ [250] [750]
/ \ / [125] [375] [625] [875]
Each node points to a page on disk
ACID — The Guarantee of Reliability
Database transactions must satisfy four properties known as ACID. These guarantees are why you trust a bank database to never lose a transfer, even if the server crashes mid-transaction.