HeliosDB Indexing Overview

Version: 7.0 Last Updated: 2026-01-04

Introduction

HeliosDB provides a comprehensive indexing system designed for high-performance data access across diverse workloads. From traditional B-tree indexes for range queries to advanced vector indexes for AI/ML applications, HeliosDB supports the full spectrum of indexing requirements for modern data-intensive applications.

This document provides an overview of all index types available in HeliosDB, their use cases, and key characteristics.

Index Types at a Glance

Index Type	Best For	Access Pattern	Range Queries	Memory Footprint
B-tree	General purpose	O(log n)	Yes	Medium
Hash	Exact matches	O(1)	No	Low
LSM-Tree	Write-heavy workloads	O(log n)	Yes	Low (tiered)
GIN	Multi-valued columns	O(log n)	Partial	High
GiST	Spatial/geometric data	O(log n)	Yes	Medium
HNSW	Vector similarity	O(log n) approx	N/A	High
IVF	Large-scale vectors	O(n/k) approx	N/A	Medium
Full-Text	Text search	O(log n)	N/A	High

B-tree Indexes

B-tree indexes are the default and most versatile index type in HeliosDB. They maintain sorted data and support efficient equality and range queries.

Key Characteristics

Balanced tree structure with O(log n) lookup, insert, and delete
Supports range queries: <, <=, =, >=, >, BETWEEN
Supports ORDER BY operations efficiently
Ideal for columns with high cardinality

Use Cases

Primary key lookups
Foreign key joins
Date/timestamp range queries
Sorted result retrieval

Example

-- Create a B-tree index (default)
CREATE INDEX idx_orders_date ON orders (order_date);

-- Composite B-tree index
CREATE INDEX idx_orders_customer_date ON orders (customer_id, order_date);

-- Query using the index
SELECT * FROM orders
WHERE customer_id = 12345
  AND order_date BETWEEN '2025-01-01' AND '2025-12-31';

Hash Indexes

Hash indexes provide O(1) constant-time lookups for exact equality comparisons. They are optimized for high-speed point queries.

Key Characteristics

O(1) lookup time for equality comparisons
Does not support range queries
Lower memory overhead than B-trees for equality-only access
Ideal for lookup tables and exact-match scenarios

Use Cases

Session ID lookups
Cache key lookups
Unique identifier searches
Configuration value retrieval

Example

-- Create a hash index
CREATE INDEX idx_sessions_id ON sessions USING HASH (session_id);

-- Query using the hash index
SELECT user_id, created_at FROM sessions WHERE session_id = 'abc123xyz';

LSM-Tree Indexes

LSM-Tree (Log-Structured Merge-Tree) indexes are optimized for write-heavy workloads. They buffer writes in memory and periodically flush to sorted disk structures.

Key Characteristics

Optimized for high write throughput
Writes buffered in memtable before flushing to SSTables
Background compaction merges and optimizes disk structures
Supports range queries via sorted SSTables

Use Cases

Event logging and time-series data
IoT sensor data ingestion
High-velocity insert workloads
Append-heavy access patterns

Example

-- Create an LSM-Tree index for high-write workloads
CREATE INDEX idx_events_timestamp ON events USING LSMTREE (timestamp);

-- Configure memtable size
ALTER INDEX idx_events_timestamp SET (memtable_size_mb = 64);

GIN (Generalized Inverted) Indexes

GIN indexes are designed for columns containing multiple values per row, such as arrays, JSONB documents, and full-text search vectors.

Key Characteristics

Inverted index structure mapping values to rows
Efficient for containment queries (@>, <@, &&)
Supports arrays, JSONB, and tsvector
Higher build and update cost than B-tree

Use Cases

JSONB document queries
Array containment searches
Tag-based filtering
Full-text search

Example

-- Create a GIN index on JSONB column
CREATE INDEX idx_products_attrs ON products USING GIN (attributes);

-- Query JSONB with the index
SELECT * FROM products
WHERE attributes @> '{"category": "electronics", "in_stock": true}';

-- Create GIN index on array column
CREATE INDEX idx_posts_tags ON posts USING GIN (tags);

-- Query array containment
SELECT * FROM posts WHERE tags && ARRAY['database', 'performance'];

GiST (Generalized Search Tree) Indexes

GiST indexes support complex data types like geometric shapes, ranges, and spatial data. They enable efficient spatial queries and nearest-neighbor searches.

Key Characteristics

Supports geometric and range data types
Enables spatial queries (contains, overlaps, nearest-neighbor)
Balanced tree with pluggable operators
Used for PostGIS-style geospatial queries

Use Cases

Geospatial location queries
Bounding box searches
IP range lookups
Temporal range overlaps

Example

-- Create a GiST index for geometric data
CREATE INDEX idx_locations_geo ON locations USING GIST (coordinates);

-- Spatial containment query
SELECT name FROM locations
WHERE coordinates && ST_MakeEnvelope(-122.5, 37.5, -122.0, 38.0);

-- Create GiST index for IP ranges
CREATE INDEX idx_ip_ranges ON network_blocks USING GIST (ip_range);

-- Query IP containment
SELECT block_name FROM network_blocks WHERE ip_range >> '192.168.1.100';

Vector Indexes (HNSW)

HNSW (Hierarchical Navigable Small World) indexes provide fast approximate nearest-neighbor search for high-dimensional vector data, essential for AI/ML applications.

Key Characteristics

Graph-based structure with navigable small-world properties
Sub-linear search time for high-dimensional vectors
Configurable accuracy vs speed tradeoff
Supports cosine, L2, and inner product distance metrics

Configuration Parameters

Parameter	Description	Default
`m`	Connections per layer	32
`ef_construction`	Build-time beam width	40
`ef_search`	Query-time beam width	10

Use Cases

Semantic search
Image similarity
Recommendation systems
RAG (Retrieval-Augmented Generation)

Example

-- Create HNSW index for vector embeddings
CREATE INDEX idx_embeddings_hnsw ON documents USING HNSW (embedding vector_l2_ops)
WITH (m = 32, ef_construction = 64);

-- Similarity search
SELECT id, title, embedding <-> '[0.1, 0.2, 0.3, ...]'::vector AS distance
FROM documents
ORDER BY embedding <-> '[0.1, 0.2, 0.3, ...]'::vector
LIMIT 10;

Vector Indexes (IVF)

IVF (Inverted File) indexes partition vectors into clusters for efficient large-scale similarity search. They offer a good balance between speed and memory usage.

Key Characteristics

Cluster-based partitioning using k-means
Searches only nearby clusters for efficiency
Configurable number of clusters (nlist) and probes (nprobe)
Better for very large vector datasets than HNSW

Configuration Parameters

Parameter	Description	Default
`nlist`	Number of clusters	100
`nprobe`	Clusters to search	10

Use Cases

Large-scale vector search (millions of vectors)
Memory-constrained environments
Batch similarity processing

Example

-- Create IVF index with product quantization
CREATE INDEX idx_images_ivf ON images USING IVFPQ (feature_vector)
WITH (nlist = 1000, nprobe = 20);

-- Adjust probe count for accuracy
SET ivfpq.nprobe = 50;

-- Similarity search
SELECT id, image_url
FROM images
ORDER BY feature_vector <-> query_vector
LIMIT 20;

Full-Text Search Indexes

Full-text search indexes enable efficient text search with support for stemming, stop words, ranking, and phrase matching.

Key Characteristics

Tokenization and stemming for natural language processing
Stop word filtering to reduce noise
Relevance ranking with ts_rank functions
Boolean and phrase queries

Use Cases

Document search
Product catalog search
Knowledge base queries
Log analysis

Example

-- Create full-text index
CREATE INDEX idx_articles_fts ON articles USING FULLTEXT (title, content);

-- Basic search with ranking
SELECT id, title, ts_rank(document, query) AS relevance
FROM articles,
     websearch_to_tsquery('english', 'database performance') AS query
WHERE document @@ query
ORDER BY relevance DESC
LIMIT 10;

-- Phrase search
SELECT * FROM articles
WHERE content @@ phraseto_tsquery('english', 'query optimization');

Partial Indexes

Partial indexes index only a subset of rows matching a WHERE condition, reducing index size and improving performance for filtered queries.

Key Characteristics

Smaller index size than full indexes
Faster maintenance for INSERT/UPDATE/DELETE
Optimized for specific query patterns

Example

-- Index only active orders
CREATE INDEX idx_orders_active ON orders (customer_id, order_date)
WHERE status = 'active';

-- Index only recent data
CREATE INDEX idx_logs_recent ON logs (created_at, severity)
WHERE created_at > CURRENT_DATE - INTERVAL '30 days';

Expression Indexes

Expression indexes index the result of an expression or function, enabling efficient queries on computed values.

Key Characteristics

Indexes computed values rather than raw column data
Supports functions, operators, and expressions
Essential for case-insensitive searches and JSON path queries

Example

-- Case-insensitive search
CREATE INDEX idx_users_email_lower ON users (LOWER(email));

-- Query using the expression index
SELECT * FROM users WHERE LOWER(email) = 'john@example.com';

-- JSONB path expression index
CREATE INDEX idx_orders_customer_name ON orders ((data->>'customer_name'));

-- Date extraction index
CREATE INDEX idx_events_year_month ON events (EXTRACT(YEAR FROM event_date), EXTRACT(MONTH FROM event_date));

Composite Indexes

Composite indexes index multiple columns together, optimizing queries that filter or sort on multiple columns.

Key Characteristics

Column order matters: leftmost prefix is used for lookups
Supports multiple access patterns with one index
Reduces index count and maintenance overhead

Example

-- Composite index for common query pattern
CREATE INDEX idx_orders_cust_date_status ON orders (customer_id, order_date DESC, status);

-- This index supports:
-- WHERE customer_id = ?
-- WHERE customer_id = ? AND order_date = ?
-- WHERE customer_id = ? AND order_date BETWEEN ? AND ?
-- ORDER BY customer_id, order_date DESC

User Guide - Comprehensive guide to creating and managing indexes
Quick Start - Get started with indexing in 5 minutes
Online Rebuild - Zero-downtime index maintenance
WAL Durability - Index crash recovery implementation
Full-Text Search Tuning - Advanced full-text search configuration
Vector Search - Multimodal vector search guide

Performance Considerations

Index Selection Guidelines

Start with B-tree for general-purpose indexing
Use Hash only for pure equality lookups with no range needs
Choose LSM-Tree for write-heavy workloads with eventual reads
Select HNSW for vector search with low latency requirements
Pick IVF for large-scale vector search with memory constraints
Apply GIN for JSONB and array columns
Use GiST for spatial and range data types

Index Maintenance

Monitor index bloat with heliosdb_index_stats view
Rebuild fragmented indexes using online rebuild for zero downtime
Update statistics regularly with ANALYZE
Remove unused indexes to reduce write overhead

Summary

HeliosDB’s indexing system provides:

9 index types covering all major access patterns
Online index rebuild with zero downtime
WAL-based durability for crash recovery
Autonomous index recommendations via the Index Advisor
GPU-accelerated vector search for AI/ML workloads

Choose the right index type for your workload, and leverage HeliosDB’s advanced features like partial indexes, expression indexes, and online maintenance for optimal performance.

HeliosDB Indexing Overview

HeliosDB Indexing Overview

Introduction

Index Types at a Glance

B-tree Indexes

Key Characteristics

Use Cases

Example

Hash Indexes

Key Characteristics

Use Cases

Example

LSM-Tree Indexes

Key Characteristics

Use Cases

Example

GIN (Generalized Inverted) Indexes

Key Characteristics

Use Cases

Example

GiST (Generalized Search Tree) Indexes

Key Characteristics

Use Cases

Example

Vector Indexes (HNSW)

Key Characteristics

Configuration Parameters

Use Cases

Example

Vector Indexes (IVF)

Key Characteristics

Configuration Parameters

Use Cases

Example

Full-Text Search Indexes

Key Characteristics

Use Cases

Example

Partial Indexes

Key Characteristics

Example

Expression Indexes

Key Characteristics

Example

Composite Indexes

Key Characteristics

Example

Related Documentation

Performance Considerations

Index Selection Guidelines

Index Maintenance

Summary