Vector Index Tuning Guide
Vector Index Tuning Guide
Version: v7.0 Status: Production Ready
Overview
Proper index configuration is critical for vector search performance. This guide covers HNSW and IVF index tuning, memory optimization, and production best practices.
HNSW Index Configuration
HNSW (Hierarchical Navigable Small World) is the recommended index type for most workloads, offering high recall with low latency.
Core Parameters
M (Max Connections)
Controls the number of bidirectional links each node maintains in the graph.
-- Default: M = 16CREATE INDEX idx ON documents USING hnsw (embedding vector_cosine_ops) WITH (m = 16);| M Value | Memory | Build Time | Query Speed | Recall |
|---|---|---|---|---|
| 8 | Low | Fast | Fast | ~90% |
| 16 | Medium | Medium | Medium | ~95% |
| 32 | High | Slow | Medium | ~97% |
| 48 | Very High | Very Slow | Slower | ~98% |
Recommendations:
- General purpose: M = 16
- High recall required: M = 32
- Memory constrained: M = 8
- Maximum accuracy: M = 48
ef_construction (Build Quality)
Controls search width during index construction. Higher values create better graph connectivity.
-- Default: ef_construction = 200CREATE INDEX idx ON documents USING hnsw (embedding vector_cosine_ops) WITH (ef_construction = 200);| ef_construction | Build Time | Index Quality | Recall Impact |
|---|---|---|---|
| 50 | Very Fast | Low | -5-10% recall |
| 100 | Fast | Medium | -2-5% recall |
| 200 | Medium | Good | Baseline |
| 400 | Slow | High | +1-2% recall |
| 800 | Very Slow | Maximum | +2-3% recall |
Recommendations:
- Development/testing: ef_construction = 100
- Production (balanced): ef_construction = 200
- Production (high quality): ef_construction = 400
- Critical applications: ef_construction = 600+
ef_search (Query Quality)
Controls search width at query time. This is a runtime parameter, not an index setting.
-- Set for current sessionSET hnsw.ef_search = 100;
-- Set globallyALTER SYSTEM SET hnsw.ef_search = 100;SELECT pg_reload_conf();| ef_search | Query Time | Recall@10 | Use Case |
|---|---|---|---|
| 20 | ~1ms | ~85% | Ultra-low latency |
| 40 | ~2ms | ~92% | Default |
| 100 | ~5ms | ~96% | Balanced |
| 200 | ~10ms | ~98% | High accuracy |
| 500 | ~25ms | ~99% | Maximum recall |
Recommendations:
- Real-time search: ef_search = 40-50
- Balanced workload: ef_search = 100
- High accuracy: ef_search = 200
- Critical queries: ef_search = 300+
Combined Configuration Examples
-- Fast queries, acceptable recall (production web search)CREATE INDEX idx_fast ON documents USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 200);-- Use with: SET hnsw.ef_search = 50;
-- Balanced (recommended for most use cases)CREATE INDEX idx_balanced ON documents USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 400);-- Use with: SET hnsw.ef_search = 100;
-- High accuracy (recommendation systems, RAG)CREATE INDEX idx_accurate ON documents USING hnsw (embedding vector_cosine_ops) WITH (m = 32, ef_construction = 400);-- Use with: SET hnsw.ef_search = 200;
-- Maximum recall (medical, legal, compliance)CREATE INDEX idx_max_recall ON documents USING hnsw (embedding vector_cosine_ops) WITH (m = 48, ef_construction = 600);-- Use with: SET hnsw.ef_search = 400;IVF Index Configuration
IVF (Inverted File) index is recommended for memory-constrained environments or billion-scale datasets.
Core Parameters
lists (Number of Clusters)
Controls the number of partitions (clusters) in the index.
-- Default: lists = sqrt(N) where N is row countCREATE INDEX idx ON documents USING ivfflat (embedding vector_l2_ops) WITH (lists = 1000);| Dataset Size | Recommended lists | Memory Overhead |
|---|---|---|
| <100K | 100-500 | Minimal |
| 100K-1M | 500-2000 | Low |
| 1M-10M | 2000-8000 | Medium |
| 10M-100M | 8000-32000 | Higher |
| >100M | 32000-65536 | High |
Formula: lists ≈ sqrt(N) is a good starting point.
nprobe (Search Probes)
Controls how many clusters to search at query time.
-- Set for current sessionSET ivf.nprobe = 20;| nprobe (% of lists) | Query Time | Recall | Use Case |
|---|---|---|---|
| 1% | Very Fast | ~80% | Ultra-low latency |
| 5% | Fast | ~90% | Default |
| 10% | Medium | ~95% | Balanced |
| 20% | Slow | ~98% | High accuracy |
IVF Configuration Examples
-- Memory efficient (large datasets)CREATE INDEX idx_ivf ON documents USING ivfflat (embedding vector_l2_ops) WITH (lists = 4096);-- Use with: SET ivf.nprobe = 50;
-- Fast search, lower accuracyCREATE INDEX idx_ivf_fast ON documents USING ivfflat (embedding vector_l2_ops) WITH (lists = 1000);-- Use with: SET ivf.nprobe = 10;Product Quantization (PQ)
For extreme memory reduction (8-16x), use IVF with product quantization:
-- IVF with PQ (memory efficient)CREATE INDEX idx_ivf_pq ON documents USING ivfpq (embedding vector_l2_ops) WITH ( lists = 4096, pq_segments = 32, -- Divide vector into 32 sub-vectors pq_bits = 8 -- 8 bits per sub-vector (256 centroids) );| Configuration | Memory/Vector | Recall Impact |
|---|---|---|
| IVF-Flat | ~4 bytes/dim | Baseline |
| IVF-SQ8 | ~1 byte/dim | -2-5% |
| IVF-PQ (m=32) | ~32 bytes total | -5-10% |
Memory vs Accuracy Tradeoffs
Memory Usage Formula
HNSW:
Memory ≈ N × (D × 4 + M × 8) bytes
Where: N = number of vectors D = vector dimension M = connections per nodeExamples (HNSW):
| Vectors | Dimensions | M | Memory |
|---|---|---|---|
| 100K | 384 | 16 | ~160 MB |
| 1M | 768 | 16 | ~3.2 GB |
| 10M | 768 | 16 | ~32 GB |
| 10M | 768 | 32 | ~44 GB |
IVF:
Memory ≈ N × (D × 4) + lists × (D × 4) bytes [IVF-Flat]Memory ≈ N × pq_segments + lists × (D × 4) bytes [IVF-PQ]Choosing the Right Index
Do you have < 1 million vectors?├── YES → Use HNSW (better recall, simpler tuning)└── NO ├── Do you have memory constraints? │ ├── YES → Use IVF-PQ (lowest memory, ~90% recall) │ └── NO → Use HNSW with M=16 or IVF-Flat └── Do you need > 95% recall? ├── YES → HNSW with M=32, ef=200+ └── NO → IVF with sufficient nprobeIndex Building Parameters
Build Time Optimization
-- Parallel index building (when supported)SET max_parallel_maintenance_workers = 8;SET maintenance_work_mem = '4GB';
-- Then create indexCREATE INDEX CONCURRENTLY idx ON documents USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 200);Incremental Building
For large datasets, consider batch loading:
-- 1. Create index on empty tableCREATE TABLE documents ( id SERIAL, embedding VECTOR(768));
CREATE INDEX idx ON documents USING hnsw (embedding vector_cosine_ops);
-- 2. Batch insert data (index updates incrementally)INSERT INTO documents (embedding)SELECT embedding FROM source_tableLIMIT 100000 OFFSET 0;
-- Repeat for remaining batches...Monitoring Build Progress
-- Check index build progressSELECT phase, tuples_done, tuples_total, ROUND(100.0 * tuples_done / NULLIF(tuples_total, 0), 2) AS pct_completeFROM pg_stat_progress_create_indexWHERE datname = current_database();Rebuilding Indexes
When to Rebuild
Rebuild your vector index when:
- Recall drops below acceptable threshold
- Query latency increases significantly
- After large bulk deletes (>20% of data)
- After significant data distribution changes
- When upgrading index parameters
Rebuild Commands
-- Offline rebuild (fastest, but blocks queries)REINDEX INDEX idx_documents_embedding;
-- Online rebuild (no blocking)REINDEX INDEX CONCURRENTLY idx_documents_embedding;
-- Rebuild with new parametersDROP INDEX idx_documents_embedding;CREATE INDEX idx_documents_embedding ON documents USING hnsw (embedding vector_cosine_ops) WITH (m = 32, ef_construction = 400); -- New parametersScheduled Rebuilds
For production systems, schedule periodic rebuilds:
-- Example: Weekly rebuild during maintenance window-- Run via cron or scheduled job
-- 1. Check if rebuild neededSELECT pg_relation_size('idx_documents_embedding') AS current_size, pg_table_size('documents') * 0.15 AS expected_overheadFROM pg_classWHERE relname = 'documents';
-- 2. Rebuild if fragmentedREINDEX INDEX CONCURRENTLY idx_documents_embedding;
-- 3. Analyze table for query optimizerANALYZE documents;Performance Monitoring
Query Performance Analysis
-- Enable timing\timing on
-- Check query plan and execution timeEXPLAIN (ANALYZE, BUFFERS, TIMING)SELECT id, titleFROM documentsORDER BY embedding <=> '[0.1, 0.2, ...]'::VECTORLIMIT 10;
-- Look for:-- - "Index Scan using idx_documents_embedding"-- - Execution Time < 10ms for HNSW-- - Rows returned matches LIMITIndex Statistics
-- Index size and row estimatesSELECT indexrelname AS index_name, pg_size_pretty(pg_relation_size(indexrelid)) AS size, idx_scan AS scans, idx_tup_read AS tuples_read, idx_tup_fetch AS tuples_fetchedFROM pg_stat_user_indexesWHERE schemaname = 'public' AND indexrelname LIKE '%embedding%';
-- Vector-specific statsSELECT * FROM hnsw_index_stats('idx_documents_embedding');Recall Measurement
-- Create test set with known ground truth-- 1. Get exact results (brute force)WITH exact AS ( SELECT id, embedding <=> query_vector AS distance FROM documents ORDER BY embedding <=> query_vector LIMIT 100)-- 2. Get approximate results (using index)SELECT COUNT(*) FILTER (WHERE a.id IN (SELECT id FROM exact LIMIT 10)) * 10 AS recall_at_10FROM ( SELECT id FROM documents ORDER BY embedding <=> query_vector LIMIT 10) a;Production Configuration Profiles
Web Search (Low Latency)
-- Index configurationCREATE INDEX idx_web ON documents USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 200);
-- Runtime settingsSET hnsw.ef_search = 40;
-- Expected: <5ms p99, ~92% recallE-Commerce (Balanced)
-- Index configurationCREATE INDEX idx_ecom ON products USING hnsw (embedding vector_cosine_ops) WITH (m = 24, ef_construction = 300);
-- Runtime settingsSET hnsw.ef_search = 100;
-- Expected: <10ms p99, ~96% recallRAG / Knowledge Base (High Accuracy)
-- Index configurationCREATE INDEX idx_rag ON knowledge_base USING hnsw (embedding vector_cosine_ops) WITH (m = 32, ef_construction = 400);
-- Runtime settingsSET hnsw.ef_search = 200;
-- Expected: <15ms p99, ~98% recallLegal / Medical (Maximum Recall)
-- Index configurationCREATE INDEX idx_compliance ON legal_docs USING hnsw (embedding vector_cosine_ops) WITH (m = 48, ef_construction = 600);
-- Runtime settingsSET hnsw.ef_search = 400;
-- Expected: <30ms p99, ~99% recallBillion-Scale (Memory Optimized)
-- Index configuration (IVF-PQ)CREATE INDEX idx_billion ON large_dataset USING ivfpq (embedding vector_l2_ops) WITH (lists = 32768, pq_segments = 32);
-- Runtime settingsSET ivf.nprobe = 64;
-- Expected: <50ms p99, ~90% recall, 10x less memoryTroubleshooting
Problem: Low Recall
Symptoms: Missing relevant results
Solutions:
- Increase
ef_search(runtime):SET hnsw.ef_search = 200; - Rebuild index with higher
ef_construction:DROP INDEX idx; CREATE INDEX idx ... WITH (ef_construction = 400); - Increase
M(requires rebuild):DROP INDEX idx; CREATE INDEX idx ... WITH (m = 32); - Check if vectors are normalized (for cosine similarity)
Problem: Slow Queries
Symptoms: High latency (>50ms)
Solutions:
- Verify index is being used:
EXPLAIN SELECT ... ORDER BY embedding <=> ...;
- Reduce
ef_search:SET hnsw.ef_search = 50; - Check memory pressure:
SELECT * FROM pg_stat_bgwriter;
- Consider IVF for very large datasets
Problem: High Memory Usage
Symptoms: OOM errors, swap usage
Solutions:
- Reduce
Mparameter:DROP INDEX idx; CREATE INDEX idx ... WITH (m = 8); - Switch to IVF-PQ:
CREATE INDEX idx USING ivfpq ... WITH (pq_segments = 32);
- Shard across multiple nodes
Problem: Index Build Too Slow
Symptoms: Index creation takes hours
Solutions:
- Reduce
ef_construction:CREATE INDEX idx ... WITH (ef_construction = 100); - Increase parallelism:
SET max_parallel_maintenance_workers = 8;
- Build on subset first, then extend
Best Practices Summary
- Start with defaults: M=16, ef_construction=200, ef_search=50
- Measure recall: Verify on representative queries before production
- Tune ef_search first: Cheapest change for accuracy vs speed
- Monitor regularly: Track latency percentiles and recall
- Rebuild periodically: After significant data changes
- Match distance metric: Cosine for text, L2 for images
- Consider hybrid indexes: Different settings for different access patterns
Status: Production Ready Version: v7.0 Last Updated: January 2026