HeliosCore

Design Philosophy

Purpose-Built Performance

HeliosCore is a native storage engine designed specifically for HeliosDB's unique capabilities: database branching, vector search, and time-travel queries. Unlike generic key-value stores that require workarounds for these features, HeliosCore was built from the ground up with them as first-class primitives.

No compromises — Not built on top of RocksDB, LevelDB, or any generic KV store. Every design decision is purpose-made for HeliosDB's workload patterns.
Branch-native — Copy-on-write semantics are built into the storage layer itself, not bolted on as an afterthought.
Vector-aware I/O — Storage layout optimized for high-dimensional vector data access patterns alongside traditional row/column data.
Time-travel native — Multi-version storage designed for efficient historical data access without the overhead of full snapshots.

SQL

-- Features powered by HeliosCore's
-- purpose-built storage layer

-- Zero-cost branch creation
CREATE BRANCH feature_auth
    FROM main;

-- Native time-travel queries
SELECT * FROM users
    AS OF '2026-01-15 09:00:00';

-- Vector search with optimized I/O
SELECT content,
       embedding <=> $query AS distance
FROM documents
ORDER BY distance
LIMIT 10;

-- All three features leverage the same
-- underlying storage primitives

I/O Architecture

Direct I/O (O_DIRECT)

HeliosCore bypasses the operating system's page cache entirely using O_DIRECT, giving the storage engine complete control over what data lives in memory. This eliminates double-buffering overhead and delivers predictable, consistent latency regardless of OS memory pressure.

Predictable latency — No surprise evictions from the OS page cache. HeliosCore decides what stays in memory based on actual workload patterns, not OS heuristics.
Eliminate double-buffering — Data is never stored twice (once in application buffer, once in OS cache). Memory is used efficiently for actual work.
Consistent performance — P99 latency stays tight even under memory pressure. No GC pauses, no page cache thrashing.
Better memory utilization — The freed OS page cache memory is available for HeliosCore's own intelligent buffer pool, vector indexes, and query processing.

Performance Profile

# Latency comparison under load
# (synthetic benchmark, 8-core NVMe)

With OS Page Cache (buffered I/O):
  P50 latency:  0.8ms
  P99 latency:  12.4ms   # spikes from eviction
  P999 latency: 45.2ms   # tail latency

With HeliosCore O_DIRECT:
  P50 latency:  0.6ms
  P99 latency:  1.8ms   # predictable
  P999 latency: 3.1ms   # tight tail

Key insight:
  Not just faster median - dramatically
  better tail latency. No surprise
  stalls from OS memory management.

Hardware Optimization

Optimized for Modern Hardware

HeliosCore is designed for the hardware of today and tomorrow. It takes full advantage of NVMe SSDs, large memory capacities, and multi-core processors — capabilities that legacy storage engines (designed in the HDD era) cannot fully exploit.

NVMe-optimized I/O — Parallel I/O submission with queue depths matched to NVMe capabilities. Sequential and random reads perform equally well on modern SSDs.
Multi-core parallelism — Lock-free data structures and work-stealing thread pools ensure all cores are utilized. No single-threaded bottlenecks.
Large memory awareness — Efficient use of 64GB+ memory configurations with tiered buffer management. Hot data stays in-memory automatically.
SIMD acceleration — Vectorized operations for compression, checksumming, and vector distance calculations using AVX2/AVX-512 where available.
Cache-line friendly — Data structures are aligned to CPU cache lines to minimize cache misses and false sharing.

Architecture

# HeliosCore hardware utilization

NVMe SSD Layer
  Queue depth:     Multi-queue I/O submission
  I/O pattern:     Aligned, direct, parallel
  Read-ahead:      Workload-adaptive prefetch

CPU Layer
  Threading:       Work-stealing thread pool
  SIMD:            AVX2/AVX-512 for hot paths
  Lock strategy:   Lock-free where possible

Memory Layer
  Buffer pool:     Workload-aware eviction
  Allocation:      Arena-based, zero-copy
  Page sizes:      Adaptive (4KB - 2MB)

Result:
  Scales linearly with hardware.
  More cores = more throughput.
  More NVMe = more IOPS.
  More RAM = larger working set.

Memory

Intelligent Memory Management

HeliosCore implements a custom buffer pool with workload-aware eviction. Instead of simple LRU, the buffer pool learns from access patterns and adapts its caching strategy to your specific workload.

Workload-aware eviction — Goes beyond LRU. Considers access frequency, recency, and cost-to-reload when deciding what to evict. Scan-resistant by design.
Adaptive caching — Automatically adjusts the ratio between row data, index data, and vector data in the buffer pool based on observed query patterns.
Hot/warm/cold tiering — Frequently accessed pages are promoted to the hot tier with the fastest eviction resistance. Cold pages are aged out gracefully.
Memory-mapped indexes — HNSW vector indexes and B-tree indexes can be memory-mapped for instant startup without warming.
Configurable memory budget — Set a total memory limit and HeliosCore manages all allocations within that budget, including query execution memory.

heliosdb.toml

# Buffer pool configuration
[storage.buffer_pool]
size = "8GB"
eviction_policy = "adaptive"

# Memory budget breakdown
# (auto-adjusted based on workload)
row_data_ratio = 0.40
index_data_ratio = 0.30
vector_data_ratio = 0.20
query_memory_ratio = 0.10

# Warm-up configuration
[storage.warmup]
enabled = true
strategy = "access_pattern"
# Preloads hot pages on startup
# based on historical access logs

# Total memory limit
[memory]
max_total = "12GB"
oom_action = "evict_and_retry"

Storage Efficiency

Compression-First Architecture

HeliosCore treats compression as a first-class feature, not an afterthought. Per-column adaptive compression automatically selects the best algorithm for each data type, achieving 70-90% storage reduction without manual tuning.

Per-column adaptive — Each column uses the compression algorithm best suited to its data type. Integers get different treatment than text, timestamps, or vectors.
Automatic algorithm selection — HeliosCore profiles sample data and picks the best compressor: ZSTD for general text, LZ4 for speed-critical paths, ALP for floating-point, FSST for short strings.
Compression-aware queries — Many predicates can be evaluated directly on compressed data without decompression, reducing both I/O and CPU work.
Vector quantization — 384x compression for vector embeddings using Product Quantization, enabling billion-scale vector search on standard hardware.
Dictionary encoding — Automatic dictionary encoding for low-cardinality columns with in-place dictionary lookup.

Compression Profile

# Per-column adaptive compression
# (automatically selected by HeliosCore)

Column: user_id (INTEGER)
  Algorithm:  Delta + Bit-packing
  Ratio:      12:1

Column: email (VARCHAR)
  Algorithm:  FSST (short strings)
  Ratio:      4:1

Column: created_at (TIMESTAMP)
  Algorithm:  Delta-of-delta + ZSTD
  Ratio:      20:1

Column: description (TEXT)
  Algorithm:  ZSTD (level 3)
  Ratio:      5:1

Column: embedding (VECTOR(768))
  Algorithm:  Product Quantization
  Ratio:      384:1

Column: status (ENUM)
  Algorithm:  Dictionary encoding
  Ratio:      32:1

# Overall storage reduction: ~80%

Branching

Branch-Aware Storage

HeliosCore's storage layer natively understands database branching. Copy-on-write at the storage level means branch creation is instantaneous regardless of database size, and branches share unchanged data automatically.

Zero-cost branch creation — Creating a branch is a metadata-only operation. No data is copied, regardless of whether the database is 1MB or 1TB.
Copy-on-write pages — Modified pages are written to new locations. Original pages remain untouched for the parent branch. Shared pages are reference-counted.
Efficient merging — Branch merges operate at the page level, not row level. Conflict detection is fast and precise.
Space-efficient — A branch that modifies 1% of the data only uses 1% additional storage, not a full copy.
Concurrent branches — Multiple branches can be read and written concurrently without interference, each with their own MVCC timeline.

SQL

-- Create a branch (instant, any DB size)
CREATE BRANCH feature_auth
    FROM main;

-- Switch to branch and make changes
USE BRANCH feature_auth;

ALTER TABLE users
    ADD COLUMN role VARCHAR(50);

INSERT INTO users (name, role)
VALUES ('admin', 'superuser');

-- Main branch is completely unaffected
USE BRANCH main;
SELECT * FROM users;
-- No 'role' column here

-- Merge when ready
MERGE BRANCH feature_auth
    INTO main;

-- Storage: only modified pages
-- are stored separately.
-- Shared data = zero overhead.

Durability

Crash Recovery

HeliosCore provides write-ahead logging with instant crash recovery. Data durability is guaranteed even during unexpected failures — power loss, kernel panics, or hardware failures. Recovery is automatic and fast.

Write-ahead logging (WAL) — All changes are written to the WAL before being applied to data files. Committed transactions are never lost.
Instant recovery — On restart after a crash, HeliosCore replays only the WAL entries since the last checkpoint. Recovery time is proportional to the WAL size, not the database size.
Checksummed pages — Every data page includes a checksum. Corrupted pages are detected immediately and can be recovered from replicas or WAL.
Atomic operations — Multi-page updates are atomic. Either all pages in a transaction are updated, or none are. No partial writes.
Background checkpointing — Periodic checkpoints reduce recovery time without impacting foreground query performance.

heliosdb.toml

# WAL configuration
[storage.wal]
enabled = true
sync_mode = "fsync"
# Options: fsync, fdatasync, async

# Checkpoint configuration
[storage.checkpoint]
interval = "5m"
wal_size_trigger = "256MB"
# Checkpoint when WAL exceeds 256MB
# or every 5 minutes, whichever first

# Page integrity
[storage.integrity]
page_checksums = true
checksum_algorithm = "crc32c"
# Hardware-accelerated on modern CPUs

# Recovery behavior
[storage.recovery]
mode = "automatic"
max_recovery_time = "30s"
# Typical recovery: < 2 seconds

Comparison

Benefits Over
Legacy Storage Engines

HeliosCore eliminates the compromises that come with building on top of generic key-value stores like RocksDB or LevelDB.

Characteristic	RocksDB / LevelDB	HeliosCore
Write amplification	10-30x (LSM compaction)	1-3x (page-oriented)
Latency consistency	Spikes during compaction	Predictable P99
Compaction stalls	Yes, can block writes	No compaction needed
Database branching	Not supported	Native, zero-cost CoW
Time-travel queries	Snapshot-based (expensive)	Native MVCC versioning
Vector data layout	Generic KV (suboptimal)	Vector-aware I/O paths
Space amplification	1.1-2x (level compaction)	Near 1x with compression
Memory management	Block cache + OS page cache	Unified buffer pool (O_DIRECT)
Crash recovery	WAL replay + manifest	Instant WAL recovery (<2s)
Compression	Per-block, single algorithm	Per-column, adaptive

Lower Write Amplification

Page-oriented design avoids the 10-30x write amplification of LSM-tree compaction. SSDs last longer and writes are faster.

Predictable Latency

No background compaction stalls. P99 latency is tight and consistent, even under heavy write workloads.

Branch-Native Storage

Copy-on-write at the storage level enables zero-cost branch creation. No external tool or workaround needed.

Adaptive Compression

Per-column algorithm selection. 70-90% storage reduction with zero manual tuning. 384x vector compression.

Direct I/O Control

Bypasses OS page cache for predictable memory usage. No double-buffering, no surprise evictions.

Instant Crash Recovery

Sub-2-second recovery from any crash. WAL replay proportional to recent changes, not database size.