Query Optimizer Quick Start Guide
Query Optimizer Quick Start Guide
Version: 7.0 Status: Production Ready Last Updated: 2026-01-04
Introduction
This guide helps you quickly understand and use HeliosDB’s query optimizer. You will learn how to analyze query plans, identify performance issues, and apply basic optimization techniques.
Using EXPLAIN ANALYZE
The most important tool for understanding query performance is EXPLAIN ANALYZE.
Basic EXPLAIN
-- See the query plan without executingEXPLAIN SELECT * FROM orders WHERE customer_id = 123;
/*Output:Index Scan using idx_orders_customer on orders (cost=0.43..8.45 rows=10 width=120) Index Cond: (customer_id = 123)
Interpretation:- Uses index scan (good!)- cost=0.43..8.45: startup..total cost- rows=10: estimated matching rows- width=120: average row size in bytes*/EXPLAIN ANALYZE (with actual execution)
-- Execute the query and show actual vs estimatedEXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 123;
/*Output:Index Scan using idx_orders_customer on orders (cost=0.43..8.45 rows=10 width=120) (actual time=0.025..0.032 ms rows=8 loops=1) Index Cond: (customer_id = 123)Planning Time: 0.15 msExecution Time: 0.08 ms
Key Metrics:- actual time: 0.025..0.032 ms (startup..total)- rows=8: actual rows returned (vs estimated 10)- loops=1: number of executions*/EXPLAIN with All Options
-- Maximum detailEXPLAIN (ANALYZE, BUFFERS, VERBOSE, FORMAT TEXT)SELECT * FROM orders WHERE customer_id = 123;
/*Output:Index Scan using idx_orders_customer on public.orders o (cost=0.43..8.45 rows=10 width=120) (actual time=0.025..0.032 ms rows=8 loops=1) Output: id, customer_id, order_date, amount, status Index Cond: (o.customer_id = 123) Buffers: shared hit=3Planning: Buffers: shared hit=10Planning Time: 0.15 msExecution Time: 0.08 ms
Buffer Analysis:- shared hit=3: 3 pages read from cache (fast)- shared read=0: 0 pages read from disk (good)*/Reading Query Plans
Plan Node Types
| Node Type | Description | Performance |
|---|---|---|
| Seq Scan | Full table scan | Slow for large tables |
| Index Scan | B-tree index lookup + table fetch | Fast for selective queries |
| Index Only Scan | All data from index | Fastest (no table access) |
| Bitmap Index Scan | Multiple index merge | Good for OR conditions |
| Hash Join | Hash table join | Fast for large joins |
| Merge Join | Sorted merge join | Fast for pre-sorted data |
| Nested Loop | Iterative lookup join | Fast for small outer tables |
Cost Interpretation
cost=startup..total
Ranges: 0-100: Very fast 100-1000: Fast 1000-10000: Moderate 10000+: Slow (needs optimization)Example: Comparing Good vs Bad Plans
Bad Plan (Sequential Scan):
-- Missing index on customer_idEXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 123;
/*Seq Scan on orders (cost=0.00..25000.00 rows=10 width=120) (actual time=50.5..245.8 ms rows=8 loops=1) Filter: (customer_id = 123) Rows Removed by Filter: 999992
Problems:- Seq Scan: Reading entire table- 245.8 ms: Very slow- Rows Removed: 999992 (scanning 1M rows for 8 results!)*/Good Plan (Index Scan):
-- With index on customer_idCREATE INDEX idx_orders_customer ON orders(customer_id);EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 123;
/*Index Scan using idx_orders_customer on orders (cost=0.43..8.45 rows=10 width=120) (actual time=0.025..0.032 ms rows=8 loops=1) Index Cond: (customer_id = 123)
Improvements:- Index Scan: Direct lookup- 0.032 ms: 7,681x faster!- No rows filtered (only reads needed rows)*/Basic Optimization Techniques
1. Create Indexes for WHERE Clauses
-- Identify slow queryEXPLAIN ANALYZESELECT * FROM users WHERE email = 'john@example.com';-- Seq Scan (cost=5000.00, time=150ms)
-- Create indexCREATE INDEX idx_users_email ON users(email);
-- Verify improvementEXPLAIN ANALYZESELECT * FROM users WHERE email = 'john@example.com';-- Index Scan (cost=8.45, time=0.05ms)2. Use Composite Indexes for Multiple Conditions
-- Query with multiple conditionsSELECT * FROM ordersWHERE customer_id = 123 AND order_date > '2025-01-01';
-- Create composite index (columns in order of selectivity)CREATE INDEX idx_orders_cust_date ON orders(customer_id, order_date);
-- Verify both conditions use indexEXPLAIN ANALYZESELECT * FROM ordersWHERE customer_id = 123 AND order_date > '2025-01-01';
/*Index Scan using idx_orders_cust_date on orders Index Cond: ((customer_id = 123) AND (order_date > '2025-01-01'))*/3. Avoid Functions on Indexed Columns
-- BAD: Function prevents index useSELECT * FROM users WHERE LOWER(email) = 'john@example.com';-- Seq Scan (function applied to every row)
-- GOOD: Store normalized dataSELECT * FROM users WHERE email_lower = 'john@example.com';-- Index Scan
-- OR: Create functional indexCREATE INDEX idx_users_email_lower ON users(LOWER(email));SELECT * FROM users WHERE LOWER(email) = 'john@example.com';-- Index Scan4. Use LIMIT for Large Result Sets
-- BAD: Returns 1 million rowsSELECT * FROM logs WHERE level = 'ERROR';-- Slow, memory intensive
-- GOOD: Limit resultsSELECT * FROM logsWHERE level = 'ERROR'ORDER BY timestamp DESCLIMIT 100;-- Fast, returns only recent errors5. Update Statistics Regularly
-- Check when table was last analyzedSELECT relname, last_analyze, last_autoanalyzeFROM pg_stat_user_tablesWHERE relname = 'orders';
-- Update statistics if staleANALYZE orders;
-- Verify improved estimatesEXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 123;Common Performance Patterns
Pattern 1: Index-Only Scan (Fastest)
-- Create covering indexCREATE INDEX idx_orders_coveringON orders(customer_id) INCLUDE (order_date, amount);
-- Query uses only indexed columnsSELECT customer_id, order_date, amountFROM orders WHERE customer_id = 123;
/*Index Only Scan using idx_orders_covering Heap Fetches: 0 <-- No table access!*/Pattern 2: Bitmap Index Scan (Multiple Conditions)
-- Query with OR conditionSELECT * FROM productsWHERE category = 'Electronics' OR price < 10;
/*Bitmap Heap Scan on products Recheck Cond: ((category = 'Electronics') OR (price < 10)) -> BitmapOr -> Bitmap Index Scan on idx_products_category -> Bitmap Index Scan on idx_products_price
Explanation: Combines results from two index scans*/Pattern 3: Efficient Join (Hash Join)
-- Join two tablesSELECT o.*, c.nameFROM orders oJOIN customers c ON o.customer_id = c.id;
/*Hash Join Hash Cond: (o.customer_id = c.id) -> Seq Scan on orders o -> Hash -> Seq Scan on customers c
Explanation: Builds hash table from smaller table (customers),probes with larger table (orders). O(n+m) complexity.*/Pattern 4: Parallel Query (Large Tables)
-- Aggregate query on large tableSELECT region, SUM(amount)FROM salesGROUP BY region;
/*Finalize GroupAggregate -> Gather Merge Workers Planned: 4 -> Partial GroupAggregate -> Parallel Seq Scan on sales
Explanation: Uses 4 parallel workers for 4x speedup*/Quick Optimization Checklist
Before Running a Query
- Does the WHERE clause have an index?
- Are JOIN columns indexed?
- Are statistics up to date?
- Is the result set limited?
After EXPLAIN ANALYZE
Check for these warning signs:
| Warning Sign | Meaning | Solution |
|---|---|---|
| Seq Scan on large table | No useful index | Create index |
| Rows Removed by Filter (high) | Low selectivity | More selective predicate |
| actual rows >> estimated | Stale statistics | Run ANALYZE |
| Nested Loop (large outer) | Wrong join type | Check memory, indexes |
| Sort (disk) | Insufficient memory | Increase work_mem |
Quick Fixes
-- 1. Missing indexCREATE INDEX idx_table_column ON table(column);
-- 2. Stale statisticsANALYZE table_name;
-- 3. Need more sort/hash memorySET work_mem = '256MB';
-- 4. Function on indexed columnCREATE INDEX idx_func ON table(FUNCTION(column));
-- 5. Low selectivity OR condition-- Rewrite as UNION ALLSELECT * FROM t WHERE a = 1UNION ALLSELECT * FROM t WHERE b = 2;EXPLAIN Output Formats
Text (Default)
EXPLAIN SELECT * FROM orders;-- Human-readable tree formatJSON (For Tools)
EXPLAIN (FORMAT JSON) SELECT * FROM orders;-- Structured JSON for programmatic analysisYAML (Configuration-Friendly)
EXPLAIN (FORMAT YAML) SELECT * FROM orders;-- Easy to read structured formatAI Mode (Natural Language)
EXPLAIN (AI) SELECT * FROM orders WHERE customer_id = 123;
/*This query searches the orders table for a specific customer.It uses an index on customer_id, making the lookup very fast.
Performance: FAST (estimated 0.1ms)No optimization suggestions - query is already optimal.*/Common Questions
Q: Why isn’t my index being used?
EXPLAIN ANALYZE (WHY_NOT)SELECT * FROM users WHERE age > 20;
/*WHY_NOT Analysis:Index idx_users_age exists but not used because: - Selectivity: 80% (too many rows match) - Sequential scan cost: 1000 - Index scan cost: 3200 (3.2x more expensive)
Suggestion: Index scans are efficient when <20% of rows match.This query returns 80% of rows, so sequential scan is faster.*/Q: How do I optimize a slow JOIN?
-- Check if join columns are indexed\d orders-- Look for idx_orders_customer_id
-- If missing, create indexCREATE INDEX idx_orders_customer_id ON orders(customer_id);
-- Ensure both tables have fresh statisticsANALYZE orders;ANALYZE customers;Q: How do I know if my query is parallel?
EXPLAIN ANALYZE SELECT SUM(amount) FROM sales;
/*Look for: Gather or Gather Merge Workers Planned: 4 Workers Launched: 4 Parallel Seq Scan
If missing, check: SHOW max_parallel_workers_per_gather; SHOW min_parallel_table_scan_size;*/Next Steps
- Read: User Guide for advanced techniques
- Study: Troubleshooting Guide for common issues
- Explore: Query Optimizer Overview for architecture details
- Practice: Run EXPLAIN ANALYZE on your own queries
Quick Reference
Essential Commands
-- Basic planEXPLAIN SELECT ...;
-- With execution statsEXPLAIN ANALYZE SELECT ...;
-- Full detailEXPLAIN (ANALYZE, BUFFERS, VERBOSE) SELECT ...;
-- AI explanationEXPLAIN (AI) SELECT ...;
-- Why not analysisEXPLAIN ANALYZE (WHY_NOT) SELECT ...;
-- Update statisticsANALYZE table_name;
-- View index usageSELECT * FROM pg_stat_user_indexes;Cost Thresholds
| Cost Range | Performance | Action |
|---|---|---|
| < 100 | Excellent | None needed |
| 100-1000 | Good | Monitor |
| 1000-10000 | Moderate | Consider optimization |
| > 10000 | Poor | Optimize immediately |
Query Optimizer Quick Start - Master query performance in minutes.