Query performance in DBMS

When we query a database there are certain operations performed, the main operation to consider is the type of scan it would result in.

The performance is calculated based on the time taken to do the operations.

Various types of operations have different costs based on disk and CPU usage. Cost can be assumed to be the amount of time taken. Therefore, lower the cost better the performance (faster response from DB)

Optimize query by calculating cost

Different kinds of query and data arrangement can result in different types of scan, Our goal is to get the optimal type of scan possible by tweaking query and data arrangement.

Calculation of cost for each type of scan helps us determine what scan results faster response for the number of rows fetched.

For example, when fetching 1000s of rows - Using an index to find the disk location of the rows, Generating bitmap, and then doing random access to fetch data is slower than a sequential scan.

Optimization in Postgres

In Postgres, when we execute a query it flows through the following components,

  1. Parser
  2. Rewrite
  3. Planner
  4. Execute

The Planner decides which type of scan will be best to use to satisfy a query

  • Each kind of scan requires different disk operations to be performed.
  • The planner calculates the disk operation cost for the scan that would resolve a part of the query

Cost calculation is combination of total number of

  • Sequential page reads
  • Random page reads
  • Rows scanned
  • Index entry scan
  • Function / operation

and their configured cost.

Postgres has a default cost calculation configuration

We can see this process in action by using EXPLAIN or EXPLAIN ANALYZE before a query.

  • EXPLAIN Simply builds a query plan and displays info about it.
  • EXPLAIN ANALYZE Builds the plan, runs it, and displays info about it.

These can be used to benchmark and evaluate how well our query performs. Also, not to be used in real data fetching.