Skip to content

Vector Search Quick Start Guide

Vector Search Quick Start Guide

Version: v7.0 Time to Complete: 15 minutes


Prerequisites

  • HeliosDB v7.0+ installed and running
  • Basic SQL knowledge
  • (Optional) Python or Rust for embedding generation

Step 1: Create a Table with Vector Column

Create a table to store documents with their vector embeddings:

-- Create table with vector column
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
title VARCHAR(255),
content TEXT,
category VARCHAR(50),
embedding VECTOR(384) -- 384-dimensional vector
);

The VECTOR(n) type supports dimensions from 1 to 65,536. Common sizes:

  • 384: all-MiniLM-L6-v2 (fast, general purpose)
  • 768: BERT-base, RoBERTa
  • 1536: OpenAI text-embedding-ada-002

Step 2: Insert Vector Data

Method 1: Direct Vector Literals

-- Insert with vector literal (bracket notation)
INSERT INTO documents (title, content, category, embedding)
VALUES (
'Introduction to Machine Learning',
'Machine learning is a subset of artificial intelligence...',
'AI',
'[0.021, -0.043, 0.087, ...]' -- 384 values
);
-- Insert multiple rows
INSERT INTO documents (title, content, category, embedding) VALUES
('Neural Networks Basics', 'Deep learning uses neural...', 'AI',
'[0.032, -0.051, 0.094, ...]'),
('Database Indexing', 'Indexes improve query performance...', 'DB',
'[-0.015, 0.072, -0.038, ...]'),
('Vector Search Guide', 'Vector similarity enables semantic...', 'Search',
'[0.045, -0.028, 0.061, ...]');

Method 2: With Explicit CAST

-- Using explicit type cast
INSERT INTO documents (title, embedding)
VALUES (
'Deep Learning Overview',
'[0.1, 0.2, 0.3, 0.4]'::VECTOR(4)
);

Method 3: Auto-Detection

When inserting into a known vector column, the type is auto-detected:

-- Vector format is automatically detected
INSERT INTO documents (id, title, embedding)
VALUES (100, 'Auto-detected Vector', '[0.1, 0.2, 0.3, 0.4]');

Method 4: Scientific Notation

-- Scientific notation for precision
INSERT INTO documents (title, embedding)
VALUES (
'Precision Example',
'[1.5e-3, -2.7e-2, 9.8e-1, 4.2e-4]'
);

Step 3: Create a Vector Index

Create an HNSW index for fast similarity search:

-- Create HNSW index for cosine similarity
CREATE INDEX idx_documents_embedding ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 200);

Index Options

Operator Classes:

  • vector_cosine_ops: Cosine distance (best for text embeddings)
  • vector_l2_ops: Euclidean distance (best for images)
  • vector_ip_ops: Inner product (for normalized vectors)

Parameters:

  • m: Number of connections per node (default: 16, range: 4-64)
  • ef_construction: Build quality (default: 200, range: 50-500)
-- High accuracy index
CREATE INDEX idx_high_accuracy ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 32, ef_construction = 400);
-- Fast, lower memory index
CREATE INDEX idx_fast ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 8, ef_construction = 100);

Find Most Similar Documents

-- Define query vector (your search embedding)
-- In practice, generate this from an embedding model
-- Search for 10 most similar documents
SELECT
id,
title,
category,
embedding <=> '[0.025, -0.041, 0.089, ...]'::VECTOR AS distance
FROM documents
ORDER BY embedding <=> '[0.025, -0.041, 0.089, ...]'::VECTOR
LIMIT 10;

Distance Operators

-- Cosine distance (0 = identical, 2 = opposite)
SELECT * FROM documents
ORDER BY embedding <=> query_vector
LIMIT 10;
-- L2/Euclidean distance (0 = identical)
SELECT * FROM documents
ORDER BY embedding <-> query_vector
LIMIT 10;
-- Inner product (negative, sort ascending for similarity)
SELECT * FROM documents
ORDER BY embedding <#> query_vector
LIMIT 10;

Step 5: Filtering with Vectors

Combine vector search with traditional SQL filters:

Filter by Category

-- Search within a specific category
SELECT id, title, embedding <=> query_vector AS distance
FROM documents
WHERE category = 'AI'
ORDER BY embedding <=> query_vector
LIMIT 10;

Filter by Multiple Conditions

-- Complex filtering with vector search
SELECT
id,
title,
category,
embedding <=> query_vector AS distance
FROM documents
WHERE category IN ('AI', 'ML')
AND LENGTH(content) > 100
AND created_at > '2024-01-01'
ORDER BY embedding <=> query_vector
LIMIT 10;

Distance Threshold Filter

-- Only return results within distance threshold
SELECT id, title
FROM documents
WHERE embedding <=> query_vector < 0.5 -- Cosine distance < 0.5
ORDER BY embedding <=> query_vector;

Step 6: Working with Query Vectors

Using Variables

-- Store query vector in variable
WITH query AS (
SELECT '[0.025, -0.041, 0.089, ...]'::VECTOR AS vec
)
SELECT
d.id,
d.title,
d.embedding <=> q.vec AS distance
FROM documents d, query q
ORDER BY d.embedding <=> q.vec
LIMIT 10;

Using Subqueries

-- Search for documents similar to another document
SELECT
d2.id,
d2.title,
d1.embedding <=> d2.embedding AS similarity
FROM documents d1, documents d2
WHERE d1.id = 1 -- Reference document
AND d2.id != 1 -- Exclude self
ORDER BY d1.embedding <=> d2.embedding
LIMIT 10;

Step 7: Batch Operations

Batch Insert

-- Efficient batch insert
INSERT INTO documents (title, content, embedding)
SELECT
'Document ' || i,
'Content for document ' || i,
-- In practice, use real embeddings
(SELECT ARRAY_AGG(random())::VECTOR FROM generate_series(1, 384))
FROM generate_series(1, 1000) AS i;
-- Search multiple queries at once
WITH queries AS (
SELECT 1 AS query_id, '[0.1, 0.2, ...]'::VECTOR AS vec
UNION ALL
SELECT 2, '[0.3, 0.4, ...]'::VECTOR
UNION ALL
SELECT 3, '[0.5, 0.6, ...]'::VECTOR
)
SELECT
q.query_id,
d.id AS doc_id,
d.title,
d.embedding <=> q.vec AS distance
FROM queries q
CROSS JOIN LATERAL (
SELECT id, title, embedding
FROM documents
ORDER BY embedding <=> q.vec
LIMIT 5
) d
ORDER BY q.query_id, distance;

Step 8: Verify Search Results

Check Index Usage

-- Verify index is being used
EXPLAIN ANALYZE
SELECT id, title
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, ...]'::VECTOR
LIMIT 10;
-- Look for "Index Scan using idx_documents_embedding"

Get Index Statistics

-- View index details
SELECT
indexname,
indexdef,
pg_size_pretty(pg_relation_size(indexname::regclass)) AS size
FROM pg_indexes
WHERE tablename = 'documents';

Complete Example Session

Here is a complete working example you can run:

-- 1. Create table
CREATE TABLE demo_docs (
id SERIAL PRIMARY KEY,
title TEXT,
embedding VECTOR(3)
);
-- 2. Insert sample data
INSERT INTO demo_docs (title, embedding) VALUES
('AI Basics', '[0.9, 0.1, 0.0]'),
('Machine Learning', '[0.8, 0.2, 0.0]'),
('Deep Learning', '[0.7, 0.3, 0.0]'),
('Database Systems', '[0.1, 0.1, 0.9]'),
('SQL Queries', '[0.0, 0.2, 0.8]');
-- 3. Create index
CREATE INDEX idx_demo_embedding ON demo_docs
USING hnsw (embedding vector_cosine_ops);
-- 4. Search for AI-related documents
SELECT
title,
embedding <=> '[1.0, 0.0, 0.0]'::VECTOR AS distance
FROM demo_docs
ORDER BY embedding <=> '[1.0, 0.0, 0.0]'::VECTOR
LIMIT 3;
-- Expected results:
-- AI Basics | 0.0557
-- Machine Learning | 0.1180
-- Deep Learning | 0.1910
-- 5. Search with filter
SELECT title, embedding <=> '[1.0, 0.0, 0.0]'::VECTOR AS distance
FROM demo_docs
WHERE title LIKE '%Learning%'
ORDER BY embedding <=> '[1.0, 0.0, 0.0]'::VECTOR;
-- 6. Cleanup
DROP TABLE demo_docs;

Common Issues and Solutions

Issue: “Vector dimension mismatch”

-- Error: dimension mismatch
INSERT INTO documents (embedding) VALUES ('[0.1, 0.2]'); -- Wrong!
-- Solution: Match declared dimension
-- If table has VECTOR(384), provide exactly 384 values
INSERT INTO documents (embedding) VALUES ('[0.1, 0.2, ..., 0.384]');

Issue: Slow queries without index

-- Check if index exists
SELECT indexname FROM pg_indexes WHERE tablename = 'documents';
-- Create index if missing
CREATE INDEX IF NOT EXISTS idx_docs_embedding ON documents
USING hnsw (embedding vector_cosine_ops);

Issue: Low recall (missing relevant results)

-- Increase ef_search parameter for better recall
SET hnsw.ef_search = 200; -- Default is 40
-- Then run your query
SELECT * FROM documents
ORDER BY embedding <=> query_vector
LIMIT 10;

Next Steps

  1. Learn Index Tuning: See INDEXING_GUIDE.md for advanced configuration
  2. Explore Use Cases: Check EXAMPLES.md for production patterns
  3. Add Hybrid Search: Combine vectors with keyword search
  4. Scale Up: Configure distributed vector search for larger datasets

Quick Reference

Vector Insert Syntax

INSERT INTO table (vector_col) VALUES ('[0.1, 0.2, 0.3]');
INSERT INTO table (vector_col) VALUES ('[0.1, 0.2, 0.3]'::VECTOR(3));

Distance Operators

OperatorMetricBest For
<=>CosineText embeddings
<->L2Image embeddings
<#>Inner ProductNormalized vectors

Index Creation

CREATE INDEX name ON table USING hnsw (column vector_cosine_ops)
WITH (m = 16, ef_construction = 200);
SELECT * FROM table ORDER BY embedding <=> query_vector LIMIT k;

Status: Production Ready Version: v7.0 Last Updated: January 2026