Query Optimizer Quick Start Guide

Version: 7.0 Status: Production Ready Last Updated: 2026-01-04

Introduction

This guide helps you quickly understand and use HeliosDB’s query optimizer. You will learn how to analyze query plans, identify performance issues, and apply basic optimization techniques.

Using EXPLAIN ANALYZE

The most important tool for understanding query performance is EXPLAIN ANALYZE.

Basic EXPLAIN

-- See the query plan without executing
EXPLAIN SELECT * FROM orders WHERE customer_id = 123;

/*
Output:
Index Scan using idx_orders_customer on orders
  (cost=0.43..8.45 rows=10 width=120)
  Index Cond: (customer_id = 123)

Interpretation:
- Uses index scan (good!)
- cost=0.43..8.45: startup..total cost
- rows=10: estimated matching rows
- width=120: average row size in bytes
*/

EXPLAIN ANALYZE (with actual execution)

-- Execute the query and show actual vs estimated
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 123;

/*
Output:
Index Scan using idx_orders_customer on orders
  (cost=0.43..8.45 rows=10 width=120)
  (actual time=0.025..0.032 ms rows=8 loops=1)
  Index Cond: (customer_id = 123)
Planning Time: 0.15 ms
Execution Time: 0.08 ms

Key Metrics:
- actual time: 0.025..0.032 ms (startup..total)
- rows=8: actual rows returned (vs estimated 10)
- loops=1: number of executions
*/

EXPLAIN with All Options

-- Maximum detail
EXPLAIN (ANALYZE, BUFFERS, VERBOSE, FORMAT TEXT)
SELECT * FROM orders WHERE customer_id = 123;

/*
Output:
Index Scan using idx_orders_customer on public.orders o
  (cost=0.43..8.45 rows=10 width=120)
  (actual time=0.025..0.032 ms rows=8 loops=1)
  Output: id, customer_id, order_date, amount, status
  Index Cond: (o.customer_id = 123)
  Buffers: shared hit=3
Planning:
  Buffers: shared hit=10
Planning Time: 0.15 ms
Execution Time: 0.08 ms

Buffer Analysis:
- shared hit=3: 3 pages read from cache (fast)
- shared read=0: 0 pages read from disk (good)
*/

Reading Query Plans

Plan Node Types

Node Type	Description	Performance
Seq Scan	Full table scan	Slow for large tables
Index Scan	B-tree index lookup + table fetch	Fast for selective queries
Index Only Scan	All data from index	Fastest (no table access)
Bitmap Index Scan	Multiple index merge	Good for OR conditions
Hash Join	Hash table join	Fast for large joins
Merge Join	Sorted merge join	Fast for pre-sorted data
Nested Loop	Iterative lookup join	Fast for small outer tables

Cost Interpretation

cost=startup..total

Ranges:
  0-100:      Very fast
  100-1000:   Fast
  1000-10000: Moderate
  10000+:     Slow (needs optimization)

Example: Comparing Good vs Bad Plans

Bad Plan (Sequential Scan):

-- Missing index on customer_id
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 123;

/*
Seq Scan on orders
  (cost=0.00..25000.00 rows=10 width=120)
  (actual time=50.5..245.8 ms rows=8 loops=1)
  Filter: (customer_id = 123)
  Rows Removed by Filter: 999992

Problems:
- Seq Scan: Reading entire table
- 245.8 ms: Very slow
- Rows Removed: 999992 (scanning 1M rows for 8 results!)
*/

Good Plan (Index Scan):

-- With index on customer_id
CREATE INDEX idx_orders_customer ON orders(customer_id);
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 123;

/*
Index Scan using idx_orders_customer on orders
  (cost=0.43..8.45 rows=10 width=120)
  (actual time=0.025..0.032 ms rows=8 loops=1)
  Index Cond: (customer_id = 123)

Improvements:
- Index Scan: Direct lookup
- 0.032 ms: 7,681x faster!
- No rows filtered (only reads needed rows)
*/

Basic Optimization Techniques

1. Create Indexes for WHERE Clauses

-- Identify slow query
EXPLAIN ANALYZE
SELECT * FROM users WHERE email = 'john@example.com';
-- Seq Scan (cost=5000.00, time=150ms)

-- Create index
CREATE INDEX idx_users_email ON users(email);

-- Verify improvement
EXPLAIN ANALYZE
SELECT * FROM users WHERE email = 'john@example.com';
-- Index Scan (cost=8.45, time=0.05ms)

2. Use Composite Indexes for Multiple Conditions

-- Query with multiple conditions
SELECT * FROM orders
WHERE customer_id = 123
  AND order_date > '2025-01-01';

-- Create composite index (columns in order of selectivity)
CREATE INDEX idx_orders_cust_date ON orders(customer_id, order_date);

-- Verify both conditions use index
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE customer_id = 123
  AND order_date > '2025-01-01';

/*
Index Scan using idx_orders_cust_date on orders
  Index Cond: ((customer_id = 123) AND (order_date > '2025-01-01'))
*/

3. Avoid Functions on Indexed Columns

-- BAD: Function prevents index use
SELECT * FROM users WHERE LOWER(email) = 'john@example.com';
-- Seq Scan (function applied to every row)

-- GOOD: Store normalized data
SELECT * FROM users WHERE email_lower = 'john@example.com';
-- Index Scan

-- OR: Create functional index
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
SELECT * FROM users WHERE LOWER(email) = 'john@example.com';
-- Index Scan

4. Use LIMIT for Large Result Sets

-- BAD: Returns 1 million rows
SELECT * FROM logs WHERE level = 'ERROR';
-- Slow, memory intensive

-- GOOD: Limit results
SELECT * FROM logs
WHERE level = 'ERROR'
ORDER BY timestamp DESC
LIMIT 100;
-- Fast, returns only recent errors

5. Update Statistics Regularly

-- Check when table was last analyzed
SELECT
  relname,
  last_analyze,
  last_autoanalyze
FROM pg_stat_user_tables
WHERE relname = 'orders';

-- Update statistics if stale
ANALYZE orders;

-- Verify improved estimates
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 123;

Common Performance Patterns

Pattern 1: Index-Only Scan (Fastest)

-- Create covering index
CREATE INDEX idx_orders_covering
ON orders(customer_id) INCLUDE (order_date, amount);

-- Query uses only indexed columns
SELECT customer_id, order_date, amount
FROM orders WHERE customer_id = 123;

/*
Index Only Scan using idx_orders_covering
  Heap Fetches: 0  <-- No table access!
*/

Pattern 2: Bitmap Index Scan (Multiple Conditions)

-- Query with OR condition
SELECT * FROM products
WHERE category = 'Electronics' OR price < 10;

/*
Bitmap Heap Scan on products
  Recheck Cond: ((category = 'Electronics') OR (price < 10))
  -> BitmapOr
       -> Bitmap Index Scan on idx_products_category
       -> Bitmap Index Scan on idx_products_price

Explanation: Combines results from two index scans
*/

Pattern 3: Efficient Join (Hash Join)

-- Join two tables
SELECT o.*, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id;

/*
Hash Join
  Hash Cond: (o.customer_id = c.id)
  -> Seq Scan on orders o
  -> Hash
       -> Seq Scan on customers c

Explanation: Builds hash table from smaller table (customers),
probes with larger table (orders). O(n+m) complexity.
*/

Pattern 4: Parallel Query (Large Tables)

-- Aggregate query on large table
SELECT region, SUM(amount)
FROM sales
GROUP BY region;

/*
Finalize GroupAggregate
  -> Gather Merge
       Workers Planned: 4
       -> Partial GroupAggregate
            -> Parallel Seq Scan on sales

Explanation: Uses 4 parallel workers for 4x speedup
*/

Quick Optimization Checklist

Before Running a Query

Does the WHERE clause have an index?
Are JOIN columns indexed?
Are statistics up to date?
Is the result set limited?

After EXPLAIN ANALYZE

Check for these warning signs:

Warning Sign	Meaning	Solution
Seq Scan on large table	No useful index	Create index
Rows Removed by Filter (high)	Low selectivity	More selective predicate
actual rows >> estimated	Stale statistics	Run ANALYZE
Nested Loop (large outer)	Wrong join type	Check memory, indexes
Sort (disk)	Insufficient memory	Increase work_mem

Quick Fixes

-- 1. Missing index
CREATE INDEX idx_table_column ON table(column);

-- 2. Stale statistics
ANALYZE table_name;

-- 3. Need more sort/hash memory
SET work_mem = '256MB';

-- 4. Function on indexed column
CREATE INDEX idx_func ON table(FUNCTION(column));

-- 5. Low selectivity OR condition
-- Rewrite as UNION ALL
SELECT * FROM t WHERE a = 1
UNION ALL
SELECT * FROM t WHERE b = 2;

EXPLAIN Output Formats

Text (Default)

EXPLAIN SELECT * FROM orders;
-- Human-readable tree format

JSON (For Tools)

EXPLAIN (FORMAT JSON) SELECT * FROM orders;
-- Structured JSON for programmatic analysis

YAML (Configuration-Friendly)

EXPLAIN (FORMAT YAML) SELECT * FROM orders;
-- Easy to read structured format

AI Mode (Natural Language)

EXPLAIN (AI) SELECT * FROM orders WHERE customer_id = 123;

/*
This query searches the orders table for a specific customer.
It uses an index on customer_id, making the lookup very fast.

Performance: FAST (estimated 0.1ms)
No optimization suggestions - query is already optimal.
*/

Common Questions

Q: Why isn’t my index being used?

EXPLAIN ANALYZE (WHY_NOT)
SELECT * FROM users WHERE age > 20;

/*
WHY_NOT Analysis:
Index idx_users_age exists but not used because:
  - Selectivity: 80% (too many rows match)
  - Sequential scan cost: 1000
  - Index scan cost: 3200 (3.2x more expensive)

Suggestion: Index scans are efficient when <20% of rows match.
This query returns 80% of rows, so sequential scan is faster.
*/

Q: How do I optimize a slow JOIN?

-- Check if join columns are indexed
\d orders
-- Look for idx_orders_customer_id

-- If missing, create index
CREATE INDEX idx_orders_customer_id ON orders(customer_id);

-- Ensure both tables have fresh statistics
ANALYZE orders;
ANALYZE customers;

Q: How do I know if my query is parallel?

EXPLAIN ANALYZE SELECT SUM(amount) FROM sales;

/*
Look for:
  Gather or Gather Merge
  Workers Planned: 4
  Workers Launched: 4
  Parallel Seq Scan

If missing, check:
  SHOW max_parallel_workers_per_gather;
  SHOW min_parallel_table_scan_size;
*/

Next Steps

Read: User Guide for advanced techniques
Study: Troubleshooting Guide for common issues
Explore: Query Optimizer Overview for architecture details
Practice: Run EXPLAIN ANALYZE on your own queries

Quick Reference

Essential Commands

-- Basic plan
EXPLAIN SELECT ...;

-- With execution stats
EXPLAIN ANALYZE SELECT ...;

-- Full detail
EXPLAIN (ANALYZE, BUFFERS, VERBOSE) SELECT ...;

-- AI explanation
EXPLAIN (AI) SELECT ...;

-- Why not analysis
EXPLAIN ANALYZE (WHY_NOT) SELECT ...;

-- Update statistics
ANALYZE table_name;

-- View index usage
SELECT * FROM pg_stat_user_indexes;

Cost Thresholds

Cost Range	Performance	Action
< 100	Excellent	None needed
100-1000	Good	Monitor
1000-10000	Moderate	Consider optimization
> 10000	Poor	Optimize immediately

Query Optimizer Quick Start - Master query performance in minutes.