HeliosDB Indexing User Guide
HeliosDB Indexing User Guide
Version: 7.0 Last Updated: 2026-01-04
Table of Contents
- When to Use Which Index Type
- Creating Indexes
- Managing Indexes
- Index Statistics and Analysis
- Online Index Rebuild
- Index Maintenance Best Practices
- Performance Impact of Indexes
- Troubleshooting
When to Use Which Index Type
Choosing the right index type is critical for query performance. Use this decision guide to select the optimal index for your use case.
Decision Matrix
| Query Pattern | Recommended Index | Alternative |
|---|---|---|
Equality lookup (=) | Hash | B-tree |
Range query (<, >, BETWEEN) | B-tree | - |
Sorting (ORDER BY) | B-tree | - |
Pattern matching (LIKE 'prefix%') | B-tree | Full-text |
JSONB containment (@>) | GIN | - |
Array operations (&&, @>) | GIN | - |
| Full-text search | Full-text | GIN |
| Geospatial queries | GiST | - |
| IP range lookup | GiST | - |
| Vector similarity (k-NN) | HNSW | IVF |
| High-write workload | LSM-Tree | B-tree |
Detailed Guidelines
Use B-tree When:
- You need range queries on numeric or date columns
- Queries require sorted output
- The column has high cardinality (many unique values)
- You need composite indexes on multiple columns
Use Hash When:
- All queries are exact equality matches
- Column has high cardinality
- You want minimal memory footprint
- Range queries are never needed
Use GIN When:
- Indexing JSONB documents
- Indexing array columns
- Supporting full-text search with tsvector
- Queries use containment operators (
@>,<@,&&)
Use GiST When:
- Working with geometric or spatial data
- Implementing nearest-neighbor queries on non-vector data
- Indexing ranges (tsrange, int4range, etc.)
- Performing IP address range lookups
Use HNSW When:
- Vector similarity search is the primary use case
- Low latency is critical (sub-millisecond)
- Dataset fits in memory
- Accuracy requirements allow approximate results
Use IVF When:
- Vector dataset is very large (millions of vectors)
- Memory is constrained
- Batch processing is acceptable
- Training data is available for clustering
Use LSM-Tree When:
- Write throughput is the bottleneck
- Reads are less frequent than writes
- Data is append-mostly
- Background compaction is acceptable
Creating Indexes
Basic Syntax
CREATE INDEX [CONCURRENTLY] [IF NOT EXISTS] index_nameON table_name [USING method](column_expression [ASC | DESC] [NULLS {FIRST | LAST}], ...)[INCLUDE (column, ...)][WHERE predicate][WITH (storage_parameter = value, ...)];B-tree Index Examples
-- Simple single-column indexCREATE INDEX idx_users_email ON users (email);
-- Composite index (column order matters!)CREATE INDEX idx_orders_cust_date ON orders (customer_id, order_date DESC);
-- Index with INCLUDE for covering queriesCREATE INDEX idx_products_sku ON products (sku) INCLUDE (name, price);
-- Concurrent index creation (non-blocking)CREATE INDEX CONCURRENTLY idx_logs_timestamp ON logs (created_at);Hash Index Examples
-- Hash index for exact lookupsCREATE INDEX idx_sessions_token ON sessions USING HASH (session_token);
-- Hash index on UUID columnCREATE INDEX idx_events_trace_id ON events USING HASH (trace_id);GIN Index Examples
-- GIN index on JSONB columnCREATE INDEX idx_configs_data ON configurations USING GIN (config_data);
-- GIN index on array columnCREATE INDEX idx_articles_tags ON articles USING GIN (tags);
-- GIN index for trigram similarity (requires pg_trgm)CREATE INDEX idx_products_name_trgm ON products USING GIN (name gin_trgm_ops);GiST Index Examples
-- GiST index for geometric dataCREATE INDEX idx_stores_location ON stores USING GIST (location);
-- GiST index for IP rangesCREATE INDEX idx_access_list_ip ON access_list USING GIST (ip_range inet_ops);
-- GiST index for timestamp rangesCREATE INDEX idx_reservations_period ON reservations USING GIST (reservation_period);Vector Index Examples
-- HNSW index for L2 distanceCREATE INDEX idx_docs_embedding_hnsw ON documentsUSING HNSW (embedding vector_l2_ops)WITH (m = 32, ef_construction = 64);
-- HNSW index for cosine similarityCREATE INDEX idx_images_features ON imagesUSING HNSW (features vector_cosine_ops)WITH (m = 48, ef_construction = 100);
-- IVF index for large-scale searchCREATE INDEX idx_products_vectors_ivf ON productsUSING IVFFLAT (embedding vector_l2_ops)WITH (lists = 1000);Full-Text Index Examples
-- Full-text index on single columnCREATE INDEX idx_articles_body_fts ON articles USING FULLTEXT (body);
-- Full-text index on multiple columnsCREATE INDEX idx_docs_fts ON documentsUSING FULLTEXT (title, abstract, content)WITH (language = 'english');Partial Index Examples
-- Index only active recordsCREATE INDEX idx_users_active_email ON users (email)WHERE is_active = true;
-- Index only recent ordersCREATE INDEX idx_orders_pending ON orders (customer_id, order_date)WHERE status IN ('pending', 'processing');
-- Index only non-null valuesCREATE INDEX idx_profiles_phone ON profiles (phone)WHERE phone IS NOT NULL;Expression Index Examples
-- Case-insensitive email searchCREATE INDEX idx_users_email_lower ON users (LOWER(email));
-- JSONB path extractionCREATE INDEX idx_events_user_id ON events ((payload->>'user_id'));
-- Date part extractionCREATE INDEX idx_orders_year_month ON orders ( EXTRACT(YEAR FROM order_date), EXTRACT(MONTH FROM order_date));
-- Computed columnCREATE INDEX idx_orders_total ON orders ((quantity * unit_price));Managing Indexes
Viewing Existing Indexes
-- List all indexes on a tableSELECT indexname, indexdefFROM pg_indexesWHERE tablename = 'orders';
-- View index size and usage statisticsSELECT schemaname, tablename, indexname, idx_scan, idx_tup_read, idx_tup_fetch, pg_size_pretty(pg_relation_size(indexrelid)) AS index_sizeFROM pg_stat_user_indexesORDER BY idx_scan DESC;
-- Find unused indexesSELECT schemaname || '.' || relname AS table, indexrelname AS index, pg_size_pretty(pg_relation_size(i.indexrelid)) AS size, idx_scan AS scansFROM pg_stat_user_indexes uiJOIN pg_index i ON ui.indexrelid = i.indexrelidWHERE idx_scan = 0 AND NOT indisuniqueORDER BY pg_relation_size(i.indexrelid) DESC;Modifying Indexes
-- Rename an indexALTER INDEX idx_old_name RENAME TO idx_new_name;
-- Move index to different tablespaceALTER INDEX idx_orders_date SET TABLESPACE fast_ssd;
-- Change storage parametersALTER INDEX idx_large_table SET (fillfactor = 90);Dropping Indexes
-- Drop a single indexDROP INDEX idx_users_email;
-- Drop if exists (no error if missing)DROP INDEX IF EXISTS idx_temporary;
-- Drop concurrently (non-blocking)DROP INDEX CONCURRENTLY idx_old_index;
-- Drop multiple indexesDROP INDEX idx_a, idx_b, idx_c;Reindexing
-- Rebuild a single indexREINDEX INDEX idx_orders_date;
-- Rebuild all indexes on a tableREINDEX TABLE orders;
-- Rebuild all indexes in a schemaREINDEX SCHEMA public;
-- Concurrent reindex (non-blocking)REINDEX INDEX CONCURRENTLY idx_users_email;Index Statistics and Analysis
Gathering Statistics
-- Analyze a single tableANALYZE orders;
-- Analyze specific columnsANALYZE orders (customer_id, order_date);
-- Verbose analyze with outputANALYZE VERBOSE orders;
-- Analyze all tables in schemaANALYZE;Viewing Index Statistics
-- Index usage statisticsSELECT indexrelname AS index_name, idx_scan AS number_of_scans, idx_tup_read AS tuples_read, idx_tup_fetch AS tuples_fetchedFROM pg_stat_user_indexesWHERE schemaname = 'public'ORDER BY idx_scan DESC;
-- Index size informationSELECT indexname, pg_size_pretty(pg_relation_size(quote_ident(indexname)::regclass)) AS size, pg_relation_size(quote_ident(indexname)::regclass) AS size_bytesFROM pg_indexesWHERE schemaname = 'public'ORDER BY pg_relation_size(quote_ident(indexname)::regclass) DESC;
-- Index bloat estimationSELECT schemaname, tablename, indexname, pg_size_pretty(pg_relation_size(indexrelid)) AS index_size, ROUND(100.0 * pg_relation_size(indexrelid) / NULLIF(pg_relation_size(relid), 0), 1) AS index_to_table_ratioFROM pg_stat_user_indexesJOIN pg_index ON indexrelid = pg_index.indexrelidORDER BY pg_relation_size(indexrelid) DESC;Using EXPLAIN to Verify Index Usage
-- Basic explainEXPLAIN SELECT * FROM orders WHERE customer_id = 12345;
-- Explain with actual timingEXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 12345;
-- Detailed explain with buffersEXPLAIN (ANALYZE, BUFFERS, VERBOSE, FORMAT TEXT)SELECT * FROM ordersWHERE customer_id = 12345 AND order_date > '2025-01-01';
-- JSON format for programmatic analysisEXPLAIN (FORMAT JSON)SELECT * FROM orders WHERE customer_id = 12345;Understanding EXPLAIN Output
-- Example output interpretationEXPLAIN ANALYZESELECT * FROM orders WHERE customer_id = 12345 AND status = 'active';
/*Output:Index Scan using idx_orders_customer on orders (cost=0.43..8.45 rows=1 width=120) Index Cond: (customer_id = 12345) Filter: (status = 'active') Rows Removed by Filter: 3 Actual time: 0.021..0.025 ms Buffers: shared hit=4
Key metrics:- "Index Scan" = Index is being used- "cost=0.43..8.45" = Startup cost..total cost- "Actual time: 0.021..0.025 ms" = Real execution time- "Buffers: shared hit=4" = 4 buffer pages read from cache*/Online Index Rebuild
HeliosDB supports online index rebuilding with zero downtime using a shadow index approach.
How It Works
- Shadow Index Creation: A new index is built with a
_rebuildsuffix - Background Building: Index is populated while original handles queries
- Incremental Sync: Changes during rebuild are captured and applied
- Atomic Cutover: Near-instant swap to new index (<50ms)
- Cleanup: Old index is dropped
Manual Online Rebuild
-- Initiate online rebuildREBUILD INDEX idx_orders_date ONLINE;
-- Check rebuild progressSELECT * FROM heliosdb_index_rebuild_progressWHERE index_name = 'idx_orders_date';
-- Cancel a rebuildCANCEL REBUILD INDEX idx_orders_date;
-- View rebuild statisticsSELECT * FROM heliosdb_rebuild_stats;Monitoring Rebuild Progress
-- Real-time progress monitoringSELECT index_name, shadow_name, status, percent_complete, rows_processed, total_rows, incremental_changes, estimated_completion, current_phaseFROM heliosdb_index_rebuild_progress;
-- Example output:-- index_name | idx_orders_date-- shadow_name | idx_orders_date_rebuild-- status | Building-- percent_complete| 45.5-- rows_processed | 4,550,000-- total_rows | 10,000,000-- incremental_changes | 1,234-- estimated_completion| 2026-01-04 15:32:00-- current_phase | Phase 2: Background buildingAutonomous Rebuild
Configure automatic fragmentation detection and rebuild:
-- Enable autonomous rebuildALTER SYSTEM SET heliosdb.auto_rebuild_enabled = true;ALTER SYSTEM SET heliosdb.fragmentation_threshold = 30; -- 30%ALTER SYSTEM SET heliosdb.rebuild_check_interval = '1 hour';ALTER SYSTEM SET heliosdb.max_concurrent_rebuilds = 2;
-- Reload configurationSELECT pg_reload_conf();Index Maintenance Best Practices
1. Analyze Regularly
-- After bulk data loadsANALYZE large_table;
-- Schedule periodic analysis (via cron or pg_cron)SELECT cron.schedule('analyze-orders', '0 2 * * *', 'ANALYZE orders');2. Monitor Bloat and Fragmentation
-- Create a monitoring viewCREATE VIEW index_health ASSELECT schemaname, tablename, indexrelname AS index_name, idx_scan AS scans, pg_size_pretty(pg_relation_size(indexrelid)) AS size, CASE WHEN idx_scan = 0 THEN 'UNUSED' WHEN idx_scan < 100 THEN 'LOW_USAGE' ELSE 'ACTIVE' END AS statusFROM pg_stat_user_indexesORDER BY pg_relation_size(indexrelid) DESC;3. Remove Unused Indexes
-- Identify candidates for removalSELECT schemaname || '.' || relname AS table, indexrelname AS index, pg_size_pretty(pg_relation_size(indexrelid)) AS wasted_spaceFROM pg_stat_user_indexes uiJOIN pg_index i ON ui.indexrelid = i.indexrelidWHERE idx_scan = 0 AND NOT indisunique AND NOT indisprimary AND pg_relation_size(indexrelid) > 10 * 1024 * 1024; -- > 10MB4. Use Partial Indexes Where Appropriate
-- Replace full index with partial indexDROP INDEX idx_orders_status;
CREATE INDEX idx_orders_pending ON orders (customer_id, order_date)WHERE status = 'pending'; -- Most queries filter for pending5. Optimize Composite Index Column Order
-- Wrong: Low cardinality column firstCREATE INDEX idx_bad ON orders (status, customer_id);
-- Right: High cardinality column firstCREATE INDEX idx_good ON orders (customer_id, status);6. Use INCLUDE for Covering Indexes
-- Avoid table lookups with INCLUDECREATE INDEX idx_orders_covering ON orders (customer_id)INCLUDE (order_date, total_amount, status);
-- Query uses index-only scanSELECT order_date, total_amount, statusFROM ordersWHERE customer_id = 12345;Performance Impact of Indexes
Write Overhead
Every index adds overhead to INSERT, UPDATE, and DELETE operations:
| Index Type | INSERT Overhead | UPDATE Overhead | DELETE Overhead |
|---|---|---|---|
| B-tree | Low-Medium | Medium | Low |
| Hash | Very Low | Low | Very Low |
| GIN | High | High | Medium |
| GiST | Medium | Medium | Medium |
| HNSW | High | High | Medium |
| Full-text | High | High | High |
Guidelines for Index Count
-- Check index count per tableSELECT relname AS table_name, COUNT(indexrelid) AS index_countFROM pg_stat_user_tablesLEFT JOIN pg_index ON indrelid = relidGROUP BY relnameHAVING COUNT(indexrelid) > 5ORDER BY index_count DESC;
-- Recommendation: Generally 3-8 indexes per table is optimal-- More than 10 indexes may indicate over-indexingMemory Considerations
-- Total index memory usageSELECT pg_size_pretty(SUM(pg_relation_size(indexrelid))) AS total_index_sizeFROM pg_stat_user_indexes;
-- Ensure indexes fit in shared_buffers for optimal performanceSHOW shared_buffers;Troubleshooting
Index Not Being Used
Symptom: EXPLAIN shows Sequential Scan instead of Index Scan
Solutions:
-- 1. Check if statistics are currentANALYZE problematic_table;
-- 2. Check if planner cost settings are appropriateSET random_page_cost = 1.1; -- For SSDsSET seq_page_cost = 1.0;
-- 3. Force index usage (for testing only)SET enable_seqscan = off;EXPLAIN ANALYZE SELECT ...;SET enable_seqscan = on;
-- 4. Check selectivitySELECT n_distinct, correlation, null_fracFROM pg_statsWHERE tablename = 'orders' AND attname = 'status';Index Scan Is Slower Than Expected
Symptom: Index scan takes longer than anticipated
Solutions:
-- 1. Check for index bloatSELECT pg_size_pretty(pg_relation_size(indexrelid)) AS index_size, pg_size_pretty(pg_relation_size(relid)) AS table_sizeFROM pg_stat_user_indexesWHERE indexrelname = 'idx_slow';
-- 2. Rebuild if bloatedREINDEX INDEX CONCURRENTLY idx_slow;
-- 3. Check for lock contentionSELECT * FROM pg_stat_activity WHERE wait_event_type = 'Lock';High Write Latency
Symptom: INSERT/UPDATE operations are slow
Solutions:
-- 1. Identify expensive indexesSELECT indexrelname, idx_tup_read, idx_tup_fetchFROM pg_stat_user_indexesORDER BY idx_tup_read DESC;
-- 2. Remove unused indexesDROP INDEX idx_never_used;
-- 3. Consider partial indexesCREATE INDEX idx_partial ON table (col) WHERE condition;Expression Index Not Used
Symptom: Index on expression is ignored
Solutions:
-- Expression must match exactly-- Wrong query (won't use LOWER index):SELECT * FROM users WHERE lower(email) = 'test@example.com';
-- Right query (uses LOWER index):SELECT * FROM users WHERE LOWER(email) = 'test@example.com';
-- Verify expression index definitionSELECT indexdef FROM pg_indexes WHERE indexname = 'idx_users_email_lower';Related Documentation
- README - Indexing overview
- Quick Start - Get started quickly
- EXPLAIN Plan Guide - Query plan analysis
- Performance Tuning - Overall performance optimization
- Full-Text Search - Full-text search tuning
Summary
Effective index management is crucial for database performance. Key takeaways:
- Choose the right index type for your query patterns
- Create indexes strategically - not too few, not too many
- Monitor index usage and remove unused indexes
- Maintain statistics with regular ANALYZE
- Use online rebuild for zero-downtime maintenance
- Verify with EXPLAIN that indexes are being used
Following these practices will ensure optimal query performance while minimizing write overhead and storage costs.