Database data storage on disk

Database persists information on disk in different forms like flat files, ISAM, heap files etc.. Depending on the Database Engine.

In a common scenario, we store some information typed into a text editor by clicking save and choosing a location.

Similarly, database saves our data into a configured location, mostly in a bit complicated structure than a text file. Database engines will structure the data and files in different ways to be able to retrieve them faster.

So, what kind of information is stored in the database?

In a social media application.

  • A user’s profile, posts, comments are all to be saved in a database.
  • This information is then to be looked up whenever the application needs to show us the information.

In an eCommerce application.

  • Things like products, purchase details, reviews, user info etc… are all saved in a database.

And all of these are saved somewhere in one or multiple disks in their servers. In reality, it involves more than just DB and also advanced procedures than simply storing in the disk, but generally, this is the idea.

When we query the database, the data is retrieved from the disk and loaded into the RAM. Then, the DB engine might perform different operations, to finally provide us with the data corresponding to the query.

Responding to queries as fast as possible

Operations within RAM is always considered fast, So Its important to arrange the data in disk.. such that the lookup is as fast as possible. Availability of data is critical in most applications.

We can confirm that everything is some sort of processing on certain data. And most of the applications need saving a state (data) and modifying it

for example, in a social media app:

  • Application saves our profile information like names and other things in a database
  • We can log in and modify the information which will be updated in the database

As users of web apps, we know how fast online apps respond to these operations. Keep in mind that a smart database has some sort of role in the fast operation experience for a lot of users.

The goal of the database is to respond to queries as fast as possible. It can’t rely on purely going up and fetching data from disk directly. So it has some features that would enable faster lookups.

Let’s see how the database looks up this information on the Disk. With an example from Postgres

DB Disk storage in Postgresql

Postgres can be configured to use a directory to dump all its files, There are mainly 3 terms to understand to explore Postgres disk storage.

  1. HeapFile

Binary files in which the data is written to, For starters, we can think each table we create is written into a file.

  1. Page

Heapfiles are divided into multiple blocks or pages. Each page is divided by 8kb in size. Pages are structured with a header, Tuple ID, Tuples and a special section.

  1. Tuples

Individual entries in a table (rows) are the tuples that are stored inside the pages.

Advanced information about these can be found here at Official PostgreSQL Doc

Here are few ways to get access to the actual files in Postgres

  • The query SHOW data_directory; will reveal where the files and folders for Postgres DB are stored.
  • SELECT oid, database FROM pg_database; will reveal the folder name corresponding to each database created in Postgres.
  • SELECT oid, relname FROM pg_class; will reveal what objects(relname - we can find table name here) are saved in each of the files(oid).

Once you find the files, you can use a Hex editor tool (Available in vs code or some standalone editor can be downloaded online) to open the file and see the actual characters of data stored inside our DB table.