Skip to content

Vector Search Architecture

Vector Search Architecture

Last Updated: November 1, 2025 Version: v6.0 Phase 2 M1 Feature: F6.9 Hybrid Vector Search

Comprehensive architecture documentation for HeliosDB’s vector search capabilities, including HNSW indexing, hybrid search, and fusion algorithms.


Table of Contents

  1. Overview
  2. HNSW Index Architecture
  3. Hybrid Search Design
  4. Fusion Algorithms
  5. Query Processing
  6. Performance Optimizations
  7. Integration with Storage Layer

Overview

Vector Search Capabilities

HeliosDB provides state-of-the-art vector search with three primary modes:

  1. Dense Vector Search (HNSW): Semantic similarity using embeddings
  2. Sparse Vector Search (BM25): Keyword-based retrieval
  3. Hybrid Search: Combined dense + sparse with intelligent fusion

Performance Metrics (v6.0):

  • Search latency: <10ms on 100K vectors
  • Recall@10: 97%+ accuracy
  • Index build time: ~2 min for 1M vectors
  • Memory overhead: ~50 bytes per vector (HNSW)

HNSW Index Architecture

Hierarchical Navigable Small World (HNSW)

Core Concept: Multi-layer proximity graph for approximate nearest neighbor search

Layer 2: A ─────────────────── F (Longest connections)
│ │
│ │
Layer 1: A ──── B ──── C ──── F ──── G (Medium connections)
│ │ │ │ │
│ │ │ │ │
Layer 0: A ─ B ─ C ─ D ─ E ─ F ─ G ─ H (All nodes, shortest hops)

Key Properties:

  • Multi-layer structure: Higher layers have sparser connections
  • Navigable: Greedy search from top layer to bottom
  • Small world: Short path between any two nodes
  • Logarithmic search: O(log N) average case

Data Structures

Node Structure:

struct HnswNode {
id: VectorId,
vector: Vec<f32>, // Dense embedding (e.g., 768 dims)
layer: usize, // Maximum layer this node appears in
neighbors: Vec<Vec<VectorId>>, // Neighbors per layer
}

Index Structure:

pub struct HnswIndex {
nodes: HashMap<VectorId, HnswNode>,
entry_point: VectorId, // Top-layer entry node
max_layer: usize,
m: usize, // Max connections per layer (default: 16)
ef_construction: usize, // Build-time search width (default: 64)
distance_fn: DistanceFunction,
}
pub enum DistanceFunction {
Cosine, // 1 - dot(a, b) / (||a|| * ||b||)
Euclidean, // sqrt(sum((a_i - b_i)^2))
DotProduct, // -dot(a, b)
}

Index Construction Algorithm

Insertion Process:

fn insert(vector: Vec<f32>, id: VectorId) {
// 1. Determine layer for new node (exponential decay)
let layer = select_layer(); // P(layer=l) = 1/2^l
// 2. Find nearest neighbors from top to bottom
let mut entry_point = self.entry_point;
for level in (layer+1..=self.max_layer).rev() {
// Search layer without adding connections
entry_point = search_layer(entry_point, vector, level, 1);
}
// 3. Insert node and create connections
for level in (0..=layer).rev() {
// Find M nearest neighbors
let candidates = search_layer(entry_point, vector, level, ef_construction);
let neighbors = select_neighbors(candidates, m);
// Create bidirectional links
for neighbor in neighbors {
add_connection(id, neighbor, level);
add_connection(neighbor, id, level);
// Prune neighbor's connections if exceeds M
prune_connections(neighbor, level);
}
}
}

Layer Selection (exponential decay):

fn select_layer() -> usize {
let m_l = 1.0 / (self.m as f64).ln();
(-rand::random::<f64>().ln() * m_l).floor() as usize
}

Complexity:

  • Insert: O(log N * M * ef_construction)
  • Search: O(log N * ef_search)
  • Space: O(N * M * avg_layer)

Search Algorithm

Greedy Search (from top to bottom):

fn search(&self, query: Vec<f32>, k: usize, ef_search: usize) -> Vec<(VectorId, f32)> {
let mut entry_point = self.entry_point;
// Phase 1: Navigate top layers (greedy descent)
for level in (1..=self.max_layer).rev() {
entry_point = search_layer(entry_point, query, level, 1);
}
// Phase 2: Search bottom layer (expand search)
let candidates = search_layer(entry_point, query, 0, ef_search);
// Phase 3: Return top-k results
candidates.iter().take(k).cloned().collect()
}
fn search_layer(&self, entry: VectorId, query: Vec<f32>, level: usize, ef: usize)
-> VectorId
{
let mut visited = HashSet::new();
let mut candidates = BinaryHeap::new(); // Max-heap by distance
let mut results = BinaryHeap::new(); // Max-heap (worst result on top)
// Initialize with entry point
let dist = distance(&self.nodes[entry].vector, &query);
candidates.push((Reverse(dist), entry));
results.push((dist, entry));
visited.insert(entry);
while let Some((Reverse(dist), current)) = candidates.pop() {
// Stop if current is farther than worst result
if dist > results.peek().unwrap().0 {
break;
}
// Explore neighbors
for &neighbor in &self.nodes[current].neighbors[level] {
if visited.insert(neighbor) {
let neighbor_dist = distance(&self.nodes[neighbor].vector, &query);
// Add to results if better than worst
if neighbor_dist < results.peek().unwrap().0 || results.len() < ef {
candidates.push((Reverse(neighbor_dist), neighbor));
results.push((neighbor_dist, neighbor));
// Remove worst result if exceeded ef
if results.len() > ef {
results.pop();
}
}
}
}
}
results.peek().unwrap().1 // Return closest node
}

Search Parameters:

  • ef_search: Search expansion factor (higher = better recall, slower)
  • Typical values: 50-200
  • Trade-off: recall vs latency

Hybrid Search Design

Architecture Overview

Hybrid Search combines dense (semantic) and sparse (keyword) retrieval:

Query: "wireless headphones"
├─────────────────┬─────────────────┐
│ │ │
▼ ▼ ▼
Dense Vector Sparse Vector Full-Text
(HNSW) (BM25) Search
│ │ │
│ [0.82, 0.91] │ "wireless": 2 │
│ [0.78, 0.89] │ "headphones": 3│
│ ... │ ... │
│ │ │
└─────────────────┴─────────────────┘
Fusion Algorithm
(RRF / Weighted / Learned)
Merged Results
[doc_1, doc_2, ...]

Data Storage

Table Schema:

CREATE TABLE products (
id BIGSERIAL PRIMARY KEY,
name TEXT,
description TEXT,
-- Dense vector (from embedding model)
embedding VECTOR(768),
-- Sparse vector (BM25 term weights)
keywords SPARSEVEC(10000),
-- Full-text search vector
search_tsv TSVECTOR
);
-- Indexes
CREATE INDEX idx_embedding ON products USING hnsw(embedding vector_cosine_ops);
CREATE INDEX idx_keywords ON products USING gin(keywords);
CREATE INDEX idx_search_tsv ON products USING gin(search_tsv);

Fusion Algorithms

1. Reciprocal Rank Fusion (RRF)

Formula:

RRF_score(doc) = Σ (1 / (k + rank_i(doc)))

Implementation:

pub fn reciprocal_rank_fusion(
dense_results: Vec<(DocId, f32)>, // [(doc, score), ...]
sparse_results: Vec<(DocId, f32)>,
k: usize, // Constant (typically 60)
) -> Vec<(DocId, f32)> {
let mut scores = HashMap::new();
// Add dense scores
for (rank, (doc_id, _score)) in dense_results.iter().enumerate() {
*scores.entry(doc_id).or_insert(0.0) += 1.0 / (k as f32 + rank as f32 + 1.0);
}
// Add sparse scores
for (rank, (doc_id, _score)) in sparse_results.iter().enumerate() {
*scores.entry(doc_id).or_insert(0.0) += 1.0 / (k as f32 + rank as f32 + 1.0);
}
// Sort by RRF score
let mut results: Vec<_> = scores.into_iter().collect();
results.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
results
}

Characteristics:

  • Simple, parameter-free (except k)
  • Handles different score scales
  • Good default choice
  • ⚠ Ignores original score magnitudes

2. Weighted Fusion

Formula:

score(doc) = α * norm(dense_score) + (1-α) * norm(sparse_score)

Implementation:

pub fn weighted_fusion(
dense_results: Vec<(DocId, f32)>,
sparse_results: Vec<(DocId, f32)>,
alpha: f32, // Weight for dense (0.0-1.0)
) -> Vec<(DocId, f32)> {
let mut scores = HashMap::new();
// Normalize dense scores to [0, 1]
let max_dense = dense_results[0].1;
for (doc_id, score) in dense_results {
*scores.entry(doc_id).or_insert(0.0) += alpha * (score / max_dense);
}
// Normalize sparse scores to [0, 1]
let max_sparse = sparse_results[0].1;
for (doc_id, score) in sparse_results {
*scores.entry(doc_id).or_insert(0.0) += (1.0 - alpha) * (score / max_sparse);
}
// Sort by weighted score
let mut results: Vec<_> = scores.into_iter().collect();
results.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
results
}

Characteristics:

  • Preserves score information
  • Tunable weight (α)
  • ⚠ Requires score normalization
  • ⚠ Sensitive to score distributions

Optimal α selection:

  • Semantic queries: α = 0.7-0.9 (favor dense)
  • Keyword queries: α = 0.1-0.3 (favor sparse)
  • Mixed queries: α = 0.5 (balanced)

3. Pre-Filter Fusion

Strategy: Filter with sparse, then re-rank with dense

Implementation:

pub fn pre_filter_fusion(
query_embedding: Vec<f32>,
sparse_results: Vec<DocId>, // Top-N from BM25
top_k: usize,
) -> Vec<(DocId, f32)> {
// Phase 1: Filter to top-N with sparse search (e.g., N=100)
let candidates = sparse_results.into_iter().take(100).collect::<Vec<_>>();
// Phase 2: Re-rank candidates with dense search
let mut dense_scores = Vec::new();
for doc_id in candidates {
let embedding = get_embedding(doc_id);
let score = cosine_similarity(&query_embedding, &embedding);
dense_scores.push((doc_id, score));
}
// Phase 3: Sort by dense score
dense_scores.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
dense_scores.into_iter().take(top_k).collect()
}

Characteristics:

  • Fast (only re-rank top-N)
  • Good for keyword + semantic queries
  • ⚠ May miss dense-only matches

4. Learned Fusion (ML-based)

Strategy: Train ML model to predict optimal fusion weights

Model:

pub struct LearnedFusion {
model: LinearRegression, // Or LightGBM, neural network
}
impl LearnedFusion {
pub fn predict_score(
&self,
dense_score: f32,
sparse_score: f32,
dense_rank: usize,
sparse_rank: usize,
query_features: Vec<f32>, // Query length, term overlap, etc.
) -> f32 {
let features = vec![
dense_score,
sparse_score,
dense_rank as f32,
sparse_rank as f32,
query_features[0], // Query length
query_features[1], // Term count
// ... more features
];
self.model.predict(&features)
}
}

Training Data:

  • Query-document relevance labels
  • Features: scores, ranks, query characteristics
  • Loss: NDCG@10 or MRR

Characteristics:

  • Best performance (when trained)
  • Adapts to data distribution
  • ⚠ Requires training data
  • ⚠ More complex

Query Processing

SQL Interface

Basic Vector Search:

SELECT id, name, embedding <=> query_embedding AS distance
FROM products
ORDER BY distance
LIMIT 10;

Hybrid Search:

SELECT * FROM hybrid_search(
table_name := 'products',
query_text := 'wireless headphones',
query_embedding := get_embedding('wireless headphones'),
fusion_algorithm := 'rrf', -- 'rrf' | 'weighted' | 'prefilter' | 'learned'
k := 60, -- RRF parameter
alpha := 0.7, -- Weighted fusion parameter
limit := 10
);

Query Execution Plan

HybridSearchExec
├─ DenseSearchExec (HNSW)
│ └─ IndexScan(products, idx_embedding)
├─ SparseSearchExec (BM25)
│ └─ IndexScan(products, idx_keywords)
└─ FusionExec (RRF)
└─ TopK(10)

Performance Optimizations

1. SIMD Distance Computation

Cosine Similarity (AVX2):

#[cfg(target_arch = "x86_64")]
unsafe fn cosine_similarity_simd(a: &[f32], b: &[f32]) -> f32 {
use std::arch::x86_64::*;
let mut dot = _mm256_setzero_ps();
let mut norm_a = _mm256_setzero_ps();
let mut norm_b = _mm256_setzero_ps();
// Process 8 floats at a time
for i in (0..a.len()).step_by(8) {
let va = _mm256_loadu_ps(&a[i]);
let vb = _mm256_loadu_ps(&b[i]);
dot = _mm256_fmadd_ps(va, vb, dot);
norm_a = _mm256_fmadd_ps(va, va, norm_a);
norm_b = _mm256_fmadd_ps(vb, vb, norm_b);
}
// Horizontal sum
let dot_sum = hsum_ps_avx(dot);
let norm_a_sum = hsum_ps_avx(norm_a);
let norm_b_sum = hsum_ps_avx(norm_b);
1.0 - (dot_sum / (norm_a_sum.sqrt() * norm_b_sum.sqrt()))
}

Speedup: 4-8x vs scalar

2. Quantization

Product Quantization (compress vectors):

pub struct ProductQuantizer {
codebooks: Vec<Vec<Vec<f32>>>, // [subspace][centroid][dims]
m: usize, // Number of subspaces
ksub: usize, // Centroids per subspace
}
// Compress 768-dim vector to 96 bytes (8x compression)
// m=96 subspaces, ksub=256 centroids → 1 byte per subspace

Speedup: 5-10x faster search, 8x less memory

3. Caching

Query Result Cache:

pub struct QueryCache {
cache: LruCache<QueryHash, Vec<(DocId, f32)>>,
}
// Cache hot queries (e.g., "wireless headphones")
// Hit rate: 30-50% for production workloads

Integration with Storage Layer

Index Persistence

File Format:

hnsw_index.bin
├─ Header (metadata)
│ ├─ version: u32
│ ├─ num_vectors: u64
│ ├─ dim: u32
│ ├─ m: u32
│ └─ max_layer: u32
├─ Node Data
│ ├─ Node 1: [id, layer, vector, neighbors]
│ ├─ Node 2: [id, layer, vector, neighbors]
│ └─ ...
└─ Entry Point: VectorId

Mmap Loading (zero-copy):

let file = File::open("hnsw_index.bin")?;
let mmap = unsafe { Mmap::map(&file)? };
let index = HnswIndex::from_bytes(&mmap)?;

Loading Time: ~1 second for 1M vectors


Benchmarks

Latency (100K vectors, 768 dims)

OperationLatency (P50)Latency (P99)
Dense search (k=10)3.2 ms8.1 ms
Sparse search (k=10)1.8 ms4.2 ms
Hybrid search (RRF, k=10)5.1 ms11.3 ms

Recall@10 Accuracy

MethodRecall@10QPS (1 thread)
Exact search100%12
HNSW (ef=50)95.2%420
HNSW (ef=100)97.8%310
HNSW (ef=200)99.1%180

Scalability (HNSW, ef=100)

VectorsBuild TimeIndex SizeSearch Latency
10K12 sec4.8 MB0.8 ms
100K2.1 min48 MB3.2 ms
1M22 min480 MB12.1 ms
10M3.8 hr4.8 GB45.3 ms

Last Updated: November 1, 2025 Version: v6.0 Phase 2 M1 Maintained by: HeliosDB Vector Search Team

Related: