SQL-Native RAG User Guide
SQL-Native RAG User Guide
HeliosDB: The First Database with SQL-Native RAG
Table of Contents
- Introduction
- Quick Start
- Core Concepts
- Creating Embedding Functions
- Creating RAG Pipelines
- Adding Documents
- Querying with RAG
- Advanced Features
- Best Practices
- Troubleshooting
- Migration Guide
- Performance Tuning
Introduction
What is SQL-Native RAG?
HeliosDB is the first database to provide Retrieval-Augmented Generation (RAG) directly through SQL. No external frameworks, no Python glue code, no complex orchestration—just pure SQL.
Traditional RAG Stack:
# 50+ lines of Python boilerplatevector_store = Chroma(...)embeddings = OpenAIEmbeddings(...)llm = ChatOpenAI(...)retriever = vector_store.as_retriever()rag_chain = RetrievalQA.from_chain_type(llm, retriever)result = rag_chain.run("What is HeliosDB?")HeliosDB SQL-Native RAG:
-- 3 lines of SQLCREATE RAG PIPELINE docs WITH (llm_model = 'openai/gpt-4');INSERT INTO docs (text) VALUES ('HeliosDB documentation...');SELECT rag_query('What is HeliosDB?', 'docs');Key Benefits
Pure SQL: No code required, works with any SQL client Database-Native: Vector search, embeddings, and LLMs integrated at the database level Production-Ready: Built-in caching, rate limiting, retry logic, and monitoring Multi-Provider: OpenAI, Anthropic, Cohere, Voyage AI, and local models Scalable: Automatic batch processing, parallel query execution Cost-Optimized: Smart caching reduces API costs by 70-90%
Quick Start
5-Minute Tutorial
-- Step 1: Create an embedding functionCREATE EMBEDDING FUNCTION embed_textUSING PROVIDER 'openai'MODEL 'text-embedding-3-small'OPTIONS (cache_ttl = '7 days', batch_size = 100);
-- Step 2: Create a RAG pipelineCREATE RAG PIPELINE knowledge_baseWITH ( embedding_function = 'embed_text', chunking_strategy = 'semantic', retrieval_k = 5, llm_model = 'openai/gpt-4', enable_reranking = true);
-- Step 3: Add documentsINSERT INTO knowledge_base (text, metadata)VALUES ('HeliosDB is a next-generation autonomous database...', '{"source": "overview.md", "section": "intro"}'), ('The vector search feature provides HNSW indexing...', '{"source": "features.md", "section": "vector"}');
-- Step 4: Query with RAGSELECT rag_query('What is HeliosDB?', 'knowledge_base');
-- Result:-- {-- "answer": "HeliosDB is a next-generation autonomous database...",-- "sources": [-- {"text": "HeliosDB is...", "score": 0.92},-- {"text": "The vector search...", "score": 0.87}-- ],-- "metadata": {"chunks_retrieved": 2, "tokens_used": 150}-- }That’s it! You now have a production-ready RAG system.
Core Concepts
1. Embedding Functions
Embedding functions convert text into vector representations. They are:
- Reusable: Create once, use everywhere
- Cached: Automatic caching to reduce API costs
- Batched: Automatic batch processing for efficiency
2. RAG Pipelines
A RAG pipeline orchestrates the entire RAG workflow:
- Chunking: Splits documents into semantic chunks
- Embedding: Generates embeddings for each chunk
- Storage: Stores chunks in vector database
- Retrieval: Finds relevant chunks via hybrid search
- Reranking (optional): Re-scores chunks for relevance
- Generation: Synthesizes answer using LLM
3. Hybrid Search
HeliosDB uses hybrid search combining:
- Vector Search: Semantic similarity via HNSW
- Full-Text Search: Keyword matching via BM25
- Metadata Filtering: Structured attribute filters
Creating Embedding Functions
Basic Syntax
CREATE EMBEDDING FUNCTION <function_name>USING PROVIDER '<provider>'MODEL '<model>'[OPTIONS (<option_key> = '<option_value>', ...)];OpenAI Embeddings
-- Small model (1536 dimensions, $0.02/1M tokens)CREATE EMBEDDING FUNCTION embed_smallUSING PROVIDER 'openai'MODEL 'text-embedding-3-small'OPTIONS (cache_ttl = '7 days', batch_size = 2048);
-- Large model (3072 dimensions, $0.13/1M tokens)CREATE EMBEDDING FUNCTION embed_largeUSING PROVIDER 'openai'MODEL 'text-embedding-3-large'OPTIONS (cache_ttl = '30 days', batch_size = 2048);Cohere Embeddings
-- English embeddingsCREATE EMBEDDING FUNCTION embed_englishUSING PROVIDER 'cohere'MODEL 'embed-english-v3.0'OPTIONS (cache_ttl = '14 days', batch_size = 96);
-- Multilingual embeddings (100+ languages)CREATE EMBEDDING FUNCTION embed_multilingualUSING PROVIDER 'cohere'MODEL 'embed-multilingual-v3.0'OPTIONS (cache_ttl = '14 days', batch_size = 96);Voyage AI Embeddings
-- General purposeCREATE EMBEDDING FUNCTION embed_voyageUSING PROVIDER 'voyage'MODEL 'voyage-large-2'OPTIONS (cache_ttl = '7 days', batch_size = 128);
-- Code-optimizedCREATE EMBEDDING FUNCTION embed_codeUSING PROVIDER 'voyage'MODEL 'voyage-code-2'OPTIONS (cache_ttl = '30 days', batch_size = 128);Local Embeddings (No API Costs)
-- Local model via ONNXCREATE EMBEDDING FUNCTION embed_localUSING PROVIDER 'local'MODEL 'sentence-transformers/all-MiniLM-L6-v2'OPTIONS ( device = 'cuda', normalize = 'true', pooling = 'mean');Using Embedding Functions Directly
-- Single textSELECT embed_small('Hello, world!');
-- Batch embeddingSELECT embed_small(text) FROM documents;
-- With ad-hoc model (no pre-registration)SELECT embed('Hello', 'openai/text-embedding-3-small');Dropping Embedding Functions
DROP EMBEDDING FUNCTION embed_text;Creating RAG Pipelines
Basic Syntax
CREATE RAG PIPELINE <pipeline_name>WITH ( llm_model = '<provider>/<model>', [embedding_function = '<function_name>',] [chunking_strategy = '<strategy>',] [retrieval_k = <number>,] [enable_reranking = <true|false>,] [context_window = <size>]);OpenAI GPT-4 Pipeline
CREATE RAG PIPELINE docs_gpt4WITH ( embedding_function = 'embed_small', chunking_strategy = 'semantic', retrieval_k = 5, llm_model = 'openai/gpt-4', enable_reranking = true, context_window = 8192);Anthropic Claude Pipeline
CREATE RAG PIPELINE docs_claudeWITH ( embedding_function = 'embed_large', chunking_strategy = 'semantic', retrieval_k = 7, llm_model = 'anthropic/claude-3-opus-20240229', enable_reranking = true, context_window = 200000 -- Claude's 200K context);Cost-Optimized Pipeline (GPT-3.5)
CREATE RAG PIPELINE docs_budgetWITH ( embedding_function = 'embed_small', chunking_strategy = 'fixed', retrieval_k = 3, llm_model = 'openai/gpt-3.5-turbo', enable_reranking = false, context_window = 4096);Multilingual Pipeline
CREATE RAG PIPELINE docs_multilingualWITH ( embedding_function = 'embed_multilingual', chunking_strategy = 'semantic', retrieval_k = 5, llm_model = 'openai/gpt-4', enable_reranking = true);Chunking Strategies
| Strategy | Description | Best For |
|---|---|---|
semantic | AI-powered semantic chunking | General documents |
fixed | Fixed-size chunks (512 tokens) | Uniform content |
recursive | Recursive splitting by headers | Structured docs |
markdown | Markdown-aware chunking | Markdown files |
Dropping RAG Pipelines
DROP RAG PIPELINE docs_gpt4;Adding Documents
Basic Insert
INSERT INTO <pipeline_name> (text, metadata)VALUES ('Document text...', '{"key": "value"}');Single Document
INSERT INTO knowledge_base (text, metadata)VALUES ( 'HeliosDB is a next-generation autonomous database with AI-native features...', '{"source": "overview.md", "date": "2025-11-05", "author": "HeliosDB Team"}');Bulk Insert
INSERT INTO knowledge_base (text, metadata)VALUES ('First document...', '{"source": "doc1.md"}'), ('Second document...', '{"source": "doc2.md"}'), ('Third document...', '{"source": "doc3.md"}');From Existing Table
INSERT INTO knowledge_base (text, metadata)SELECT content, json_build_object('source', filename, 'date', created_at)FROM documents;From Files (with metadata extraction)
-- PDF documentsINSERT INTO knowledge_base (text, metadata)SELECT extract_text(file_path), json_build_object( 'source', file_path, 'type', 'pdf', 'pages', extract_page_count(file_path) )FROM uploaded_pdfs;
-- Markdown filesINSERT INTO knowledge_base (text, metadata)SELECT read_file(file_path), json_build_object('source', file_path, 'type', 'markdown')FROM markdown_files;Document Updates
-- Update document content (re-embeds automatically)UPDATE knowledge_baseSET text = 'Updated document content...'WHERE metadata->>'source' = 'overview.md';
-- Update metadata only (no re-embedding)UPDATE knowledge_baseSET metadata = metadata || '{"reviewed": true}'WHERE metadata->>'source' = 'overview.md';Document Deletion
-- Delete by metadataDELETE FROM knowledge_baseWHERE metadata->>'source' = 'old_doc.md';
-- Delete all documentsTRUNCATE TABLE knowledge_base;Querying with RAG
Basic Query
SELECT rag_query('<question>', '<pipeline_name>');Simple Questions
-- Basic querySELECT rag_query('What is HeliosDB?', 'knowledge_base');
-- Follow-up questionSELECT rag_query('What are its key features?', 'knowledge_base');
-- Complex questionSELECT rag_query( 'How does HeliosDB compare to PostgreSQL for vector search?', 'knowledge_base');Query with Options
SELECT rag_query( '<question>', '<pipeline>', retrieval_k => <number>, enable_reranking => <true|false>, system_prompt => '<custom prompt>', filters => '<json filters>', temperature => <float>);Advanced Queries
-- Retrieve more chunksSELECT rag_query( 'Explain vector search in detail', 'knowledge_base', retrieval_k => 10);
-- Disable reranking (faster, lower cost)SELECT rag_query( 'Quick question', 'knowledge_base', enable_reranking => false);
-- Custom system promptSELECT rag_query( 'What is HeliosDB?', 'knowledge_base', system_prompt => 'You are a database expert. Provide technical details.');
-- Metadata filteringSELECT rag_query( 'What are the new features?', 'knowledge_base', filters => '{"source": "changelog.md", "date": {"$gte": "2025-01-01"}}');
-- Adjust LLM temperatureSELECT rag_query( 'Tell me about HeliosDB', 'knowledge_base', temperature => 0.7);Analyzing Results
-- Extract answer onlySELECT (rag_query('What is HeliosDB?', 'kb'))->>'answer';
-- Extract sourcesSELECT jsonb_array_elements( (rag_query('What is HeliosDB?', 'kb'))->'sources');
-- Extract metadataSELECT (rag_query('What is HeliosDB?', 'kb'))->'metadata';
-- Count tokens usedSELECT (rag_query('What is HeliosDB?', 'kb'))->'metadata'->>'tokens_used';Advanced Features
1. Multi-Step RAG Queries
-- Step 1: Gather contextWITH context AS ( SELECT rag_query('What is vector search?', 'kb') AS result),-- Step 2: Deep divedeep_dive AS ( SELECT rag_query( 'Explain HNSW algorithm in the context above', 'kb', system_prompt => 'Use this context: ' || (SELECT result->>'answer' FROM context) ) AS result)SELECT * FROM deep_dive;2. Cross-Pipeline Queries
-- Query multiple knowledge basesSELECT 'technical' AS source, rag_query('What is HeliosDB?', 'technical_docs') AS answerUNION ALLSELECT 'marketing' AS source, rag_query('What is HeliosDB?', 'marketing_docs') AS answer;3. Conditional RAG
-- Use different pipelines based on question typeSELECT CASE WHEN question LIKE '%technical%' THEN rag_query(question, 'technical_pipeline') WHEN question LIKE '%pricing%' THEN rag_query(question, 'sales_pipeline') ELSE rag_query(question, 'general_pipeline')END AS answerFROM user_questions;4. RAG with Function Composition
-- Translate, then RAGSELECT rag_query( translate(user_input, 'auto', 'en'), 'english_kb');
-- Summarize RAG resultsSELECT summarize( (rag_query('Explain HeliosDB architecture', 'kb'))->>'answer', max_length => 100);5. Hybrid Queries (RAG + Traditional SQL)
-- RAG + analyticsWITH rag_result AS ( SELECT rag_query('What are common support issues?', 'support_kb') AS result),ticket_stats AS ( SELECT category, COUNT(*) AS count, AVG(resolution_time) AS avg_time FROM support_tickets GROUP BY category)SELECT (SELECT result->>'answer' FROM rag_result) AS ai_summary, json_agg(ticket_stats) AS statisticsFROM ticket_stats;6. Streaming Responses
-- Stream tokens as they're generated (vendor extension)SELECT rag_query_stream('Tell me about HeliosDB', 'kb');Best Practices
1. Choosing Embedding Models
For General Text:
- Best Quality:
openai/text-embedding-3-large(3072D, $0.13/1M) - Best Value:
openai/text-embedding-3-small(1536D, $0.02/1M) - Multilingual:
cohere/embed-multilingual-v3.0(1024D) - No Cost:
local/all-MiniLM-L6-v2(384D)
For Code:
- Best:
voyage/voyage-code-2(1536D) - Alternative:
openai/text-embedding-3-large
For Low Latency:
- Use local models with GPU acceleration
- Enable aggressive caching (30+ days)
2. Choosing LLM Models
For Complex Reasoning:
anthropic/claude-3-opus-20240229(200K context)openai/gpt-4-turbo-preview(128K context)
For Speed & Cost:
openai/gpt-3.5-turbo(16K context, 10x cheaper)anthropic/claude-3-haiku(200K context, fast)
For Code Generation:
openai/gpt-4-turbo-previewanthropic/claude-3-opus
3. Optimizing Retrieval
-- Start with k=3-5 for most queriesretrieval_k => 5
-- Increase for complex questionsretrieval_k => 10
-- Decrease for simple lookupsretrieval_k => 3
-- Enable reranking for qualityenable_reranking => true
-- Disable reranking for speed/costenable_reranking => false4. Chunking Strategy Selection
| Content Type | Strategy | Why |
|---|---|---|
| Technical docs | semantic | Preserves meaning |
| Books/articles | recursive | Respects structure |
| Code | fixed | Consistent sizes |
| Markdown | markdown | Section-aware |
| Mixed | semantic | Most flexible |
5. Metadata Best Practices
-- Good metadata structure{ "source": "docs/architecture.md", "section": "Vector Search", "date": "2025-11-05", "author": "Engineering Team", "tags": ["vector", "search", "performance"], "version": "6.0", "reviewed": true}
-- Use metadata for filteringSELECT rag_query( 'What are the new features?', 'docs', filters => '{"version": "6.0", "reviewed": true}');6. Cost Optimization
-- Aggressive caching (70-90% cost reduction)CREATE EMBEDDING FUNCTION embed_cachedUSING PROVIDER 'openai'MODEL 'text-embedding-3-small'OPTIONS (cache_ttl = '30 days', batch_size = 2048);
-- Use cheaper models for simple queriesCREATE RAG PIPELINE budget_ragWITH ( llm_model = 'openai/gpt-3.5-turbo', retrieval_k = 3, enable_reranking = false);
-- Local embeddings (zero API cost)CREATE EMBEDDING FUNCTION embed_freeUSING PROVIDER 'local'MODEL 'all-MiniLM-L6-v2';7. Quality Optimization
-- High-quality pipelineCREATE RAG PIPELINE premium_ragWITH ( embedding_function = 'embed_large', chunking_strategy = 'semantic', retrieval_k = 10, llm_model = 'anthropic/claude-3-opus', enable_reranking = true, context_window = 200000);Troubleshooting
Empty Results
Problem: RAG returns “I don’t have enough information…”
Solutions:
-- 1. Check if documents existSELECT COUNT(*) FROM knowledge_base;
-- 2. Test retrieval directlySELECT * FROM vector_search('your query', 'knowledge_base', k => 10);
-- 3. Increase retrieval_kSELECT rag_query('query', 'kb', retrieval_k => 10);
-- 4. Check metadata filtersSELECT rag_query('query', 'kb', filters => '{}'); -- Remove filters
-- 5. Verify chunkingSELECT chunk_document('Sample text', 'semantic');Poor Quality Answers
Problem: Answers are vague or incorrect
Solutions:
-- 1. Enable rerankingSELECT rag_query('query', 'kb', enable_reranking => true);
-- 2. Use larger embedding modelCREATE EMBEDDING FUNCTION embed_betterUSING PROVIDER 'openai'MODEL 'text-embedding-3-large';
-- 3. Use better LLMCREATE RAG PIPELINE better_ragWITH (llm_model = 'openai/gpt-4');
-- 4. Add custom system promptSELECT rag_query( 'query', 'kb', system_prompt => 'You are an expert. Provide detailed, accurate answers.');
-- 5. Increase contextSELECT rag_query('query', 'kb', retrieval_k => 15);High Latency
Problem: Queries take too long
Solutions:
-- 1. Reduce retrieval_kSELECT rag_query('query', 'kb', retrieval_k => 3);
-- 2. Disable rerankingSELECT rag_query('query', 'kb', enable_reranking => false);
-- 3. Use faster LLMCREATE RAG PIPELINE fast_ragWITH (llm_model = 'openai/gpt-3.5-turbo');
-- 4. Enable cachingCREATE EMBEDDING FUNCTION embed_fastUSING PROVIDER 'openai'MODEL 'text-embedding-3-small'OPTIONS (cache_ttl = '7 days');
-- 5. Use local embeddingsCREATE EMBEDDING FUNCTION embed_localUSING PROVIDER 'local'MODEL 'all-MiniLM-L6-v2'OPTIONS (device = 'cuda');High Costs
Problem: API costs are too high
Solutions:
-- 1. Enable aggressive cachingOPTIONS (cache_ttl = '30 days', batch_size = 2048)
-- 2. Use cheaper modelsMODEL 'text-embedding-3-small' -- vs text-embedding-3-largellm_model = 'openai/gpt-3.5-turbo' -- vs gpt-4
-- 3. Reduce retrieval_kretrieval_k => 3
-- 4. Use local embeddingsUSING PROVIDER 'local'
-- 5. Monitor usageSELECT function_name, COUNT(*) AS calls, SUM(tokens_used) AS total_tokens, SUM(cost) AS total_costFROM embedding_usageGROUP BY function_name;Debugging
-- Check pipeline configurationSELECT * FROM rag_pipelines WHERE name = 'knowledge_base';
-- List embedding functionsSELECT * FROM embedding_functions;
-- Check document countSELECT COUNT(*) FROM knowledge_base;
-- View cache statsSELECT * FROM embedding_cache_stats;
-- Monitor query performanceSELECT query, execution_time, chunks_retrieved, tokens_used, costFROM rag_query_logORDER BY execution_time DESCLIMIT 10;Migration Guide
From LangChain (Python)
Before (LangChain):
from langchain.embeddings import OpenAIEmbeddingsfrom langchain.vectorstores import Chromafrom langchain.chat_models import ChatOpenAIfrom langchain.chains import RetrievalQA
# Setup (50+ lines)embeddings = OpenAIEmbeddings(model="text-embedding-3-small")vector_store = Chroma(persist_directory="./db", embedding_function=embeddings)llm = ChatOpenAI(model="gpt-4", temperature=0)retriever = vector_store.as_retriever(search_kwargs={"k": 5})rag_chain = RetrievalQA.from_chain_type( llm=llm, retriever=retriever, return_source_documents=True)
# Add documentsvector_store.add_texts(["doc1", "doc2", "doc3"])
# Queryresult = rag_chain.run("What is HeliosDB?")After (HeliosDB SQL):
-- Setup (3 lines)CREATE EMBEDDING FUNCTION embed USING PROVIDER 'openai' MODEL 'text-embedding-3-small';CREATE RAG PIPELINE docs WITH (llm_model = 'openai/gpt-4', retrieval_k = 5);
-- Add documents (1 line)INSERT INTO docs (text) VALUES ('doc1'), ('doc2'), ('doc3');
-- Query (1 line)SELECT rag_query('What is HeliosDB?', 'docs');From pgvector + Custom Code
Before (pgvector + Python):
import openaiimport psycopg2
# Manual embeddingresponse = openai.Embedding.create(input="query", model="text-embedding-3-small")query_vector = response['data'][0]['embedding']
# Manual vector searchconn = psycopg2.connect(...)cur = conn.cursor()cur.execute(""" SELECT text FROM documents ORDER BY embedding <=> %s LIMIT 5""", (query_vector,))results = cur.fetchall()
# Manual LLM callcontext = "\n".join([r[0] for r in results])response = openai.ChatCompletion.create( model="gpt-4", messages=[ {"role": "system", "content": "Use this context: " + context}, {"role": "user", "content": "What is HeliosDB?"} ])answer = response['choices'][0]['message']['content']After (HeliosDB SQL):
-- All in one querySELECT rag_query('What is HeliosDB?', 'docs');From Elasticsearch + LLM
Before:
// Elasticsearch searchconst searchResults = await esClient.search({ index: 'documents', body: { query: { match: { text: 'HeliosDB' } }, size: 5 }});
// Extract contextconst context = searchResults.hits.hits.map(h => h._source.text).join('\n');
// LLM callconst response = await openai.chat.completions.create({ model: 'gpt-4', messages: [ { role: 'system', content: `Context: ${context}` }, { role: 'user', content: 'What is HeliosDB?' } ]});After:
SELECT rag_query('What is HeliosDB?', 'docs');Performance Tuning
1. Embedding Performance
-- Baseline: ~100ms per textCREATE EMBEDDING FUNCTION embed_baselineUSING PROVIDER 'openai' MODEL 'text-embedding-3-small';
-- Optimized: ~10ms per text (10x faster)CREATE EMBEDDING FUNCTION embed_optimizedUSING PROVIDER 'openai' MODEL 'text-embedding-3-small'OPTIONS ( cache_ttl = '30 days', -- Hit rate: 70-90% batch_size = 2048 -- Max batch size);
-- Local: ~1ms per text (100x faster, GPU required)CREATE EMBEDDING FUNCTION embed_localUSING PROVIDER 'local' MODEL 'all-MiniLM-L6-v2'OPTIONS (device = 'cuda', batch_size = 512);2. Query Performance
-- Fast query (<500ms)SELECT rag_query( 'Simple question?', 'docs', retrieval_k => 3, -- Fewer chunks enable_reranking => false, -- Skip reranking llm_model => 'gpt-3.5-turbo' -- Faster model (override));
-- Quality query (1-3s)SELECT rag_query( 'Complex question?', 'docs', retrieval_k => 10, -- More chunks enable_reranking => true, -- Better ranking llm_model => 'gpt-4' -- Better quality);
-- Streaming query (immediate response)SELECT rag_query_stream('Question?', 'docs');3. Indexing Performance
-- Slow: One at a timeINSERT INTO docs (text) VALUES ('doc1');INSERT INTO docs (text) VALUES ('doc2');-- etc...
-- Fast: Bulk insertINSERT INTO docs (text) VALUES ('doc1'), ('doc2'), ('doc3'), ...;
-- Faster: Disable indexing, bulk insert, re-enableALTER TABLE docs DISABLE TRIGGER ALL;INSERT INTO docs (text) SELECT content FROM source_table;ALTER TABLE docs ENABLE TRIGGER ALL;4. Cache Configuration
-- Monitoring cache performanceSELECT function_name, cache_hits, cache_misses, cache_hit_rate, cache_size_mb, cache_evictionsFROM embedding_cache_stats;
-- Optimal cache TTL by use case-- - Static documentation: 30+ days-- - Frequently updated: 1-7 days-- - Real-time data: 1-24 hours-- - One-time queries: 0 (disable)
CREATE EMBEDDING FUNCTION embed_staticOPTIONS (cache_ttl = '30 days'); -- Static docs
CREATE EMBEDDING FUNCTION embed_dynamicOPTIONS (cache_ttl = '6 hours'); -- Dynamic content5. Benchmarking
-- Query latency breakdownEXPLAIN ANALYZESELECT rag_query('What is HeliosDB?', 'docs');
-- Results:-- - Embedding: 15ms (cached)-- - Vector search: 8ms-- - Reranking: 45ms-- - LLM generation: 1200ms-- - Total: 1268ms
-- Optimize based on bottleneck-- - Slow embedding → enable caching or use local models-- - Slow vector search → optimize index, reduce retrieval_k-- - Slow reranking → disable if not critical-- - Slow LLM → use faster model or reduce contextAppendix
Supported Providers
| Provider | Models | Dimensions | Cost | Best For |
|---|---|---|---|---|
| OpenAI | text-embedding-3-small | 1536 | $0.02/1M | General purpose |
| OpenAI | text-embedding-3-large | 3072 | $0.13/1M | High quality |
| Cohere | embed-english-v3.0 | 1024 | $0.10/1M | English text |
| Cohere | embed-multilingual-v3.0 | 1024 | $0.10/1M | 100+ languages |
| Voyage | voyage-large-2 | 1536 | $0.12/1M | General purpose |
| Voyage | voyage-code-2 | 1536 | $0.12/1M | Code |
| Local | all-MiniLM-L6-v2 | 384 | Free | Low latency |
| Local | BGE-large | 1024 | Free | High quality |
LLM Models
| Provider | Model | Context | Cost (per 1M tokens) | Best For |
|---|---|---|---|---|
| OpenAI | gpt-4-turbo | 128K | $10/$30 (in/out) | Quality |
| OpenAI | gpt-3.5-turbo | 16K | $0.50/$1.50 | Speed/cost |
| Anthropic | claude-3-opus | 200K | $15/$75 | Quality + context |
| Anthropic | claude-3-sonnet | 200K | $3/$15 | Balanced |
| Anthropic | claude-3-haiku | 200K | $0.25/$1.25 | Speed/cost |
Environment Variables
# API Keysexport OPENAI_API_KEY="sk-..."export ANTHROPIC_API_KEY="sk-ant-..."export COHERE_API_KEY="..."export VOYAGE_API_KEY="..."
# Configurationexport HELIOSDB_EMBEDDING_CACHE_DIR="/var/cache/heliosdb/embeddings"export HELIOSDB_EMBEDDING_CACHE_SIZE_GB="10"export HELIOSDB_LLM_TIMEOUT_SECONDS="30"export HELIOSDB_LLM_MAX_RETRIES="3"Configuration File
embeddings: cache: enabled: true directory: /var/cache/heliosdb/embeddings max_size_gb: 10 default_ttl: 604800 # 7 days
rate_limits: openai: requests_per_minute: 3000 tokens_per_minute: 1000000 cohere: requests_per_minute: 10000 voyage: requests_per_minute: 300
llm: timeout_seconds: 30 max_retries: 3 retry_delay_seconds: 2
default_options: temperature: 0.7 max_tokens: 2000 top_p: 0.9
rag: default_chunking_strategy: semantic default_retrieval_k: 5 default_reranking: true max_context_tokens: 100000Support & Resources
- Documentation: https://docs.heliosdb.com/rag
- API Reference: https://docs.heliosdb.com/api/rag
- Examples: https://github.com/heliosdb/examples/rag
- Discord: https://discord.gg/heliosdb
- GitHub Issues: https://github.com/heliosdb/heliosdb/issues
Last Updated: November 5, 2025 Version: 6.0 Author: HeliosDB Team
HeliosDB: The first database with SQL-native RAG. Build production AI applications with pure SQL—no frameworks required.