HeliosDB Indexing Overview
HeliosDB Indexing Overview
Version: 7.0 Last Updated: 2026-01-04
Introduction
HeliosDB provides a comprehensive indexing system designed for high-performance data access across diverse workloads. From traditional B-tree indexes for range queries to advanced vector indexes for AI/ML applications, HeliosDB supports the full spectrum of indexing requirements for modern data-intensive applications.
This document provides an overview of all index types available in HeliosDB, their use cases, and key characteristics.
Index Types at a Glance
| Index Type | Best For | Access Pattern | Range Queries | Memory Footprint |
|---|---|---|---|---|
| B-tree | General purpose | O(log n) | Yes | Medium |
| Hash | Exact matches | O(1) | No | Low |
| LSM-Tree | Write-heavy workloads | O(log n) | Yes | Low (tiered) |
| GIN | Multi-valued columns | O(log n) | Partial | High |
| GiST | Spatial/geometric data | O(log n) | Yes | Medium |
| HNSW | Vector similarity | O(log n) approx | N/A | High |
| IVF | Large-scale vectors | O(n/k) approx | N/A | Medium |
| Full-Text | Text search | O(log n) | N/A | High |
B-tree Indexes
B-tree indexes are the default and most versatile index type in HeliosDB. They maintain sorted data and support efficient equality and range queries.
Key Characteristics
- Balanced tree structure with O(log n) lookup, insert, and delete
- Supports range queries:
<,<=,=,>=,>,BETWEEN - Supports ORDER BY operations efficiently
- Ideal for columns with high cardinality
Use Cases
- Primary key lookups
- Foreign key joins
- Date/timestamp range queries
- Sorted result retrieval
Example
-- Create a B-tree index (default)CREATE INDEX idx_orders_date ON orders (order_date);
-- Composite B-tree indexCREATE INDEX idx_orders_customer_date ON orders (customer_id, order_date);
-- Query using the indexSELECT * FROM ordersWHERE customer_id = 12345 AND order_date BETWEEN '2025-01-01' AND '2025-12-31';Hash Indexes
Hash indexes provide O(1) constant-time lookups for exact equality comparisons. They are optimized for high-speed point queries.
Key Characteristics
- O(1) lookup time for equality comparisons
- Does not support range queries
- Lower memory overhead than B-trees for equality-only access
- Ideal for lookup tables and exact-match scenarios
Use Cases
- Session ID lookups
- Cache key lookups
- Unique identifier searches
- Configuration value retrieval
Example
-- Create a hash indexCREATE INDEX idx_sessions_id ON sessions USING HASH (session_id);
-- Query using the hash indexSELECT user_id, created_at FROM sessions WHERE session_id = 'abc123xyz';LSM-Tree Indexes
LSM-Tree (Log-Structured Merge-Tree) indexes are optimized for write-heavy workloads. They buffer writes in memory and periodically flush to sorted disk structures.
Key Characteristics
- Optimized for high write throughput
- Writes buffered in memtable before flushing to SSTables
- Background compaction merges and optimizes disk structures
- Supports range queries via sorted SSTables
Use Cases
- Event logging and time-series data
- IoT sensor data ingestion
- High-velocity insert workloads
- Append-heavy access patterns
Example
-- Create an LSM-Tree index for high-write workloadsCREATE INDEX idx_events_timestamp ON events USING LSMTREE (timestamp);
-- Configure memtable sizeALTER INDEX idx_events_timestamp SET (memtable_size_mb = 64);GIN (Generalized Inverted) Indexes
GIN indexes are designed for columns containing multiple values per row, such as arrays, JSONB documents, and full-text search vectors.
Key Characteristics
- Inverted index structure mapping values to rows
- Efficient for containment queries (
@>,<@,&&) - Supports arrays, JSONB, and tsvector
- Higher build and update cost than B-tree
Use Cases
- JSONB document queries
- Array containment searches
- Tag-based filtering
- Full-text search
Example
-- Create a GIN index on JSONB columnCREATE INDEX idx_products_attrs ON products USING GIN (attributes);
-- Query JSONB with the indexSELECT * FROM productsWHERE attributes @> '{"category": "electronics", "in_stock": true}';
-- Create GIN index on array columnCREATE INDEX idx_posts_tags ON posts USING GIN (tags);
-- Query array containmentSELECT * FROM posts WHERE tags && ARRAY['database', 'performance'];GiST (Generalized Search Tree) Indexes
GiST indexes support complex data types like geometric shapes, ranges, and spatial data. They enable efficient spatial queries and nearest-neighbor searches.
Key Characteristics
- Supports geometric and range data types
- Enables spatial queries (contains, overlaps, nearest-neighbor)
- Balanced tree with pluggable operators
- Used for PostGIS-style geospatial queries
Use Cases
- Geospatial location queries
- Bounding box searches
- IP range lookups
- Temporal range overlaps
Example
-- Create a GiST index for geometric dataCREATE INDEX idx_locations_geo ON locations USING GIST (coordinates);
-- Spatial containment querySELECT name FROM locationsWHERE coordinates && ST_MakeEnvelope(-122.5, 37.5, -122.0, 38.0);
-- Create GiST index for IP rangesCREATE INDEX idx_ip_ranges ON network_blocks USING GIST (ip_range);
-- Query IP containmentSELECT block_name FROM network_blocks WHERE ip_range >> '192.168.1.100';Vector Indexes (HNSW)
HNSW (Hierarchical Navigable Small World) indexes provide fast approximate nearest-neighbor search for high-dimensional vector data, essential for AI/ML applications.
Key Characteristics
- Graph-based structure with navigable small-world properties
- Sub-linear search time for high-dimensional vectors
- Configurable accuracy vs speed tradeoff
- Supports cosine, L2, and inner product distance metrics
Configuration Parameters
| Parameter | Description | Default |
|---|---|---|
m | Connections per layer | 32 |
ef_construction | Build-time beam width | 40 |
ef_search | Query-time beam width | 10 |
Use Cases
- Semantic search
- Image similarity
- Recommendation systems
- RAG (Retrieval-Augmented Generation)
Example
-- Create HNSW index for vector embeddingsCREATE INDEX idx_embeddings_hnsw ON documents USING HNSW (embedding vector_l2_ops)WITH (m = 32, ef_construction = 64);
-- Similarity searchSELECT id, title, embedding <-> '[0.1, 0.2, 0.3, ...]'::vector AS distanceFROM documentsORDER BY embedding <-> '[0.1, 0.2, 0.3, ...]'::vectorLIMIT 10;Vector Indexes (IVF)
IVF (Inverted File) indexes partition vectors into clusters for efficient large-scale similarity search. They offer a good balance between speed and memory usage.
Key Characteristics
- Cluster-based partitioning using k-means
- Searches only nearby clusters for efficiency
- Configurable number of clusters (nlist) and probes (nprobe)
- Better for very large vector datasets than HNSW
Configuration Parameters
| Parameter | Description | Default |
|---|---|---|
nlist | Number of clusters | 100 |
nprobe | Clusters to search | 10 |
Use Cases
- Large-scale vector search (millions of vectors)
- Memory-constrained environments
- Batch similarity processing
Example
-- Create IVF index with product quantizationCREATE INDEX idx_images_ivf ON images USING IVFPQ (feature_vector)WITH (nlist = 1000, nprobe = 20);
-- Adjust probe count for accuracySET ivfpq.nprobe = 50;
-- Similarity searchSELECT id, image_urlFROM imagesORDER BY feature_vector <-> query_vectorLIMIT 20;Full-Text Search Indexes
Full-text search indexes enable efficient text search with support for stemming, stop words, ranking, and phrase matching.
Key Characteristics
- Tokenization and stemming for natural language processing
- Stop word filtering to reduce noise
- Relevance ranking with ts_rank functions
- Boolean and phrase queries
Use Cases
- Document search
- Product catalog search
- Knowledge base queries
- Log analysis
Example
-- Create full-text indexCREATE INDEX idx_articles_fts ON articles USING FULLTEXT (title, content);
-- Basic search with rankingSELECT id, title, ts_rank(document, query) AS relevanceFROM articles, websearch_to_tsquery('english', 'database performance') AS queryWHERE document @@ queryORDER BY relevance DESCLIMIT 10;
-- Phrase searchSELECT * FROM articlesWHERE content @@ phraseto_tsquery('english', 'query optimization');Partial Indexes
Partial indexes index only a subset of rows matching a WHERE condition, reducing index size and improving performance for filtered queries.
Key Characteristics
- Smaller index size than full indexes
- Faster maintenance for INSERT/UPDATE/DELETE
- Optimized for specific query patterns
Example
-- Index only active ordersCREATE INDEX idx_orders_active ON orders (customer_id, order_date)WHERE status = 'active';
-- Index only recent dataCREATE INDEX idx_logs_recent ON logs (created_at, severity)WHERE created_at > CURRENT_DATE - INTERVAL '30 days';Expression Indexes
Expression indexes index the result of an expression or function, enabling efficient queries on computed values.
Key Characteristics
- Indexes computed values rather than raw column data
- Supports functions, operators, and expressions
- Essential for case-insensitive searches and JSON path queries
Example
-- Case-insensitive searchCREATE INDEX idx_users_email_lower ON users (LOWER(email));
-- Query using the expression indexSELECT * FROM users WHERE LOWER(email) = 'john@example.com';
-- JSONB path expression indexCREATE INDEX idx_orders_customer_name ON orders ((data->>'customer_name'));
-- Date extraction indexCREATE INDEX idx_events_year_month ON events (EXTRACT(YEAR FROM event_date), EXTRACT(MONTH FROM event_date));Composite Indexes
Composite indexes index multiple columns together, optimizing queries that filter or sort on multiple columns.
Key Characteristics
- Column order matters: leftmost prefix is used for lookups
- Supports multiple access patterns with one index
- Reduces index count and maintenance overhead
Example
-- Composite index for common query patternCREATE INDEX idx_orders_cust_date_status ON orders (customer_id, order_date DESC, status);
-- This index supports:-- WHERE customer_id = ?-- WHERE customer_id = ? AND order_date = ?-- WHERE customer_id = ? AND order_date BETWEEN ? AND ?-- ORDER BY customer_id, order_date DESCRelated Documentation
- User Guide - Comprehensive guide to creating and managing indexes
- Quick Start - Get started with indexing in 5 minutes
- Online Rebuild - Zero-downtime index maintenance
- WAL Durability - Index crash recovery implementation
- Full-Text Search Tuning - Advanced full-text search configuration
- Vector Search - Multimodal vector search guide
Performance Considerations
Index Selection Guidelines
- Start with B-tree for general-purpose indexing
- Use Hash only for pure equality lookups with no range needs
- Choose LSM-Tree for write-heavy workloads with eventual reads
- Select HNSW for vector search with low latency requirements
- Pick IVF for large-scale vector search with memory constraints
- Apply GIN for JSONB and array columns
- Use GiST for spatial and range data types
Index Maintenance
- Monitor index bloat with
heliosdb_index_statsview - Rebuild fragmented indexes using online rebuild for zero downtime
- Update statistics regularly with
ANALYZE - Remove unused indexes to reduce write overhead
Summary
HeliosDB’s indexing system provides:
- 9 index types covering all major access patterns
- Online index rebuild with zero downtime
- WAL-based durability for crash recovery
- Autonomous index recommendations via the Index Advisor
- GPU-accelerated vector search for AI/ML workloads
Choose the right index type for your workload, and leverage HeliosDB’s advanced features like partial indexes, expression indexes, and online maintenance for optimal performance.