Skip to content

Conversational BI Architecture Design

Conversational BI Architecture Design

HeliosDB v7.0 World-First Innovation

Document Version: 1.0 Created: November 9, 2025 Status: Architecture Design - Ready for Implementation Investment: $1M over 2.5 months ARR Impact: $60M Target Accuracy: 95%+ on BIRD dataset (vs. current SOTA 68-80%)


Executive Summary

This document defines the complete architecture for HeliosDB’s Conversational BI system - a natural language to SQL interface with multi-turn context management, 95%+ accuracy, and comprehensive query explanation capabilities.

Key Differentiators:

  1. Multi-Turn Context: 10+ turn conversation memory with state management
  2. Best-in-Class Accuracy: 95%+ on BIRD dataset (vs. SOTA 68-80%)
  3. Deep Explanation: Natural language query plans with optimization suggestions
  4. Production-Ready: <2s latency, scalable, integrated with existing HeliosDB components
  5. Flexible Model Support: Local (Ollama, ONNX) and cloud (OpenAI, Anthropic, Cohere)

Table of Contents

  1. System Overview
  2. Architecture Principles
  3. Component Design
  4. Data Flow Architecture
  5. Integration Points
  6. Multi-Turn Context Management
  7. NL2SQL Engine Design
  8. Query Explanation System
  9. Model Architecture
  10. Performance Optimization
  11. Testing Strategy
  12. Implementation Roadmap
  13. Success Metrics
  14. Risk Mitigation

1. System Overview

1.1 Vision

Enable business users to interact with HeliosDB using natural language, eliminating the need for SQL expertise while maintaining the power and precision of structured queries.

1.2 Core Capabilities

┌─────────────────────────────────────────────────────────────────┐
│ Conversational BI System │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Input: "Show me revenue trends for Q4 2024 by region" │
│ "Which region had the highest growth?" │
│ "Break that down by product category" │
│ │
│ ↓ Multi-Turn Context Engine │
│ ↓ Schema-Aware NL2SQL │
│ ↓ Query Validation & Optimization │
│ ↓ Execution & Result Formatting │
│ ↓ Natural Language Explanation │
│ │
│ Output: SQL Query + Results + Explanation + Optimization Tips │
│ │
└─────────────────────────────────────────────────────────────────┘

1.3 Architecture Layers

┌──────────────────────────────────────────────────────────────────┐
│ User Interface Layer │
│ (SQL CLI, Web UI, GraphQL API, REST API, Protocol Handlers) │
└────────────────────────┬─────────────────────────────────────────┘
┌────────────────────────▼─────────────────────────────────────────┐
│ Conversational BI Engine │
│ ┌──────────────┐ ┌────────────┐ ┌─────────────────────────┐ │
│ │ Session │ │ Context │ │ Clarification │ │
│ │ Manager │ │ Tracker │ │ Engine │ │
│ └──────────────┘ └────────────┘ └─────────────────────────┘ │
└────────────────────────┬─────────────────────────────────────────┘
┌────────────────────────▼─────────────────────────────────────────┐
│ NL2SQL Engine │
│ ┌──────────────┐ ┌────────────┐ ┌─────────────────────────┐ │
│ │ Schema │ │ Query │ │ Validation & │ │
│ │ Augmenter │ │ Generator │ │ Correction │ │
│ └──────────────┘ └────────────┘ └─────────────────────────┘ │
└────────────────────────┬─────────────────────────────────────────┘
┌────────────────────────▼─────────────────────────────────────────┐
│ Query Explanation Layer │
│ ┌──────────────┐ ┌────────────┐ ┌─────────────────────────┐ │
│ │ Plan │ │ Performance│ │ Optimization │ │
│ │ Explainer │ │ Predictor │ │ Suggester │ │
│ └──────────────┘ └────────────┘ └─────────────────────────┘ │
└────────────────────────┬─────────────────────────────────────────┘
┌────────────────────────▼─────────────────────────────────────────┐
│ HeliosDB Core Integration │
│ ┌──────────────┐ ┌────────────┐ ┌─────────────────────────┐ │
│ │ heliosdb- │ │ heliosdb- │ │ heliosdb-monitoring │ │
│ │ compute │ │ schema │ │ │ │
│ └──────────────┘ └────────────┘ └─────────────────────────┘ │
└──────────────────────────────────────────────────────────────────┘

2. Architecture Principles

2.1 Design Principles

  1. Accuracy First: Prioritize correctness over speed (within latency budget)
  2. Context Awareness: Maintain full conversation history and leverage it intelligently
  3. Schema-Driven: Use database schema as ground truth for query generation
  4. Explainability: Every SQL query comes with a clear explanation
  5. Incremental Refinement: Support iterative query improvement through clarifications
  6. Model Agnostic: Support both local and cloud LLM models
  7. Production Hardened: Robust error handling, caching, monitoring

2.2 Non-Functional Requirements

RequirementTargetMeasurement
Accuracy95%+BIRD dataset execution accuracy
Latency<2s p50, <5s p99Query generation time
Context Length10+ turnsConversation depth
Throughput100+ queries/secConcurrent conversations
Availability99.9%Uptime SLA
Cache Hit Rate>80%Repeated/similar queries

2.3 Security & Privacy

  • Data Privacy: No query data sent to cloud unless explicitly configured
  • Schema Protection: Schema metadata encrypted in transit/rest
  • Access Control: Respect existing HeliosDB RBAC/ABAC policies
  • Audit Trail: All NL queries logged with user context
  • PII Detection: Automatic detection and masking of sensitive data in explanations

3. Component Design

3.1 Core Components

3.1.1 Conversational BI Engine (heliosdb-conversational-bi)

Responsibility: Orchestrate the entire NL2SQL pipeline

Key Modules:

pub struct ConversationalBiEngine {
session_manager: Arc<SessionManager>,
context_tracker: Arc<ContextTracker>,
nl2sql_engine: Arc<Nl2SqlEngine>,
explanation_engine: Arc<ExplanationEngine>,
schema_cache: Arc<SchemaCache>,
query_cache: Arc<QueryCache>,
model_router: Arc<ModelRouter>,
config: ConversationalConfig,
}
pub struct ConversationalConfig {
pub max_turns: usize, // Default: 20
pub context_window_tokens: usize, // Default: 8000
pub primary_model: ModelConfig, // Primary LLM
pub fallback_model: Option<ModelConfig>, // Fallback if primary fails
pub enable_query_cache: bool, // Default: true
pub enable_schema_cache: bool, // Default: true
pub max_concurrent_sessions: usize, // Default: 1000
pub session_timeout_minutes: u64, // Default: 30
}

API Surface:

impl ConversationalBiEngine {
// Start a new conversation
pub async fn create_session(
&self,
user_id: &str,
database: &str,
) -> Result<SessionId>;
// Process natural language query
pub async fn process_query(
&self,
session_id: SessionId,
query: &str,
) -> Result<ConversationResponse>;
// Get conversation history
pub async fn get_history(
&self,
session_id: SessionId,
) -> Result<Vec<ConversationTurn>>;
// Clear conversation context
pub async fn reset_session(
&self,
session_id: SessionId,
) -> Result<()>;
}
pub struct ConversationResponse {
pub turn_id: u32,
pub sql: Option<String>, // Generated SQL (if query was understood)
pub explanation: String, // Natural language explanation
pub results: Option<QueryResults>, // Execution results
pub clarifications: Vec<String>, // Clarifying questions (if ambiguous)
pub suggestions: Vec<String>, // Optimization suggestions
pub confidence: f64, // 0.0-1.0 confidence score
pub latency_ms: f64,
}

3.1.2 Session Manager

Responsibility: Manage conversation sessions and lifecycle

pub struct SessionManager {
sessions: Arc<DashMap<SessionId, Session>>,
persistence: Arc<SessionPersistence>,
config: SessionConfig,
}
pub struct Session {
pub id: SessionId,
pub user_id: String,
pub database: String,
pub created_at: DateTime<Utc>,
pub last_accessed: DateTime<Utc>,
pub turn_count: u32,
pub context: ConversationContext,
}
pub struct SessionConfig {
pub max_sessions_per_user: usize, // Default: 10
pub idle_timeout_minutes: u64, // Default: 30
pub persist_sessions: bool, // Default: true (for recovery)
pub cleanup_interval_minutes: u64, // Default: 5
}
impl SessionManager {
// Session lifecycle
pub async fn create(&self, user_id: &str, database: &str) -> Result<SessionId>;
pub async fn get(&self, session_id: SessionId) -> Result<Session>;
pub async fn touch(&self, session_id: SessionId) -> Result<()>; // Update last_accessed
pub async fn delete(&self, session_id: SessionId) -> Result<()>;
// Cleanup
pub async fn cleanup_expired(&self) -> Result<usize>; // Returns count of cleaned sessions
pub async fn get_user_sessions(&self, user_id: &str) -> Result<Vec<SessionId>>;
}

3.1.3 Context Tracker

Responsibility: Track and maintain conversation context across turns

pub struct ContextTracker {
// Uses semantic similarity to track entities and intents
embedding_model: Arc<EmbeddingModel>,
entity_extractor: Arc<EntityExtractor>,
reference_resolver: Arc<ReferenceResolver>,
}
pub struct ConversationContext {
pub turns: VecDeque<ConversationTurn>, // Bounded by max_turns
pub entities: HashMap<String, Entity>, // Tracked entities (tables, columns, values)
pub intents: VecDeque<Intent>, // Intent history
pub current_focus: Option<Focus>, // What is the user currently focused on?
}
pub struct ConversationTurn {
pub turn_id: u32,
pub timestamp: DateTime<Utc>,
pub user_query: String,
pub sql: Option<String>,
pub results_summary: Option<String>, // Don't store full results (memory)
pub entities_mentioned: Vec<String>,
pub intent: Intent,
}
pub enum Intent {
Query, // Retrieve data
Explore, // Browse/understand data
Analyze, // Compute aggregations/trends
Compare, // Compare entities
Filter, // Refine previous results
Clarify, // Ask follow-up question
Export, // Export results
}
pub struct Focus {
pub entity_type: EntityType, // Table, Column, Value, Time Range
pub entity_name: String,
pub context_from_turn: u32, // Which turn introduced this focus?
}
impl ContextTracker {
// Add a new turn
pub async fn add_turn(&self, context: &mut ConversationContext, turn: ConversationTurn);
// Resolve references ("that", "it", "the previous query")
pub async fn resolve_references(&self, query: &str, context: &ConversationContext) -> Result<String>;
// Extract context-relevant information for prompt
pub async fn build_context_prompt(&self, context: &ConversationContext) -> String;
// Detect if context needs clarification
pub fn needs_clarification(&self, query: &str, context: &ConversationContext) -> Option<Vec<String>>;
}

3.1.4 NL2SQL Engine

Responsibility: Convert natural language to SQL with high accuracy

Architecture:

┌─────────────────────────────────────────────────────────────┐
│ NL2SQL Pipeline │
├─────────────────────────────────────────────────────────────┤
│ │
│ 1. Schema Augmentation │
│ - Add column descriptions, examples, relationships │
│ - Add domain-specific vocabulary │
│ - Add value distributions and statistics │
│ │
│ 2. Query Understanding │
│ - Intent detection (filter, aggregate, join, etc.) │
│ - Entity extraction (tables, columns, values) │
│ - Ambiguity detection │
│ │
│ 3. SQL Generation │
│ - Few-shot prompting with similar examples │
│ - Chain-of-thought reasoning │
│ - Multi-dialect support (PostgreSQL, MySQL, Oracle) │
│ │
│ 4. Validation & Correction │
│ - Syntax validation │
│ - Semantic validation (column existence, types) │
│ - Self-correction via LLM feedback loop │
│ │
│ 5. Optimization │
│ - Query rewriting for performance │
│ - Index recommendations │
│ │
└─────────────────────────────────────────────────────────────┘

Implementation:

pub struct Nl2SqlEngine {
schema_augmenter: Arc<SchemaAugmenter>,
query_understander: Arc<QueryUnderstander>,
sql_generator: Arc<SqlGenerator>,
validator: Arc<SqlValidator>,
optimizer: Arc<QueryOptimizer>,
example_store: Arc<ExampleStore>, // For few-shot learning
}
impl Nl2SqlEngine {
pub async fn generate_sql(
&self,
query: &str,
context: &ConversationContext,
schema: &DatabaseSchema,
dialect: SqlDialect,
) -> Result<SqlGenerationResult>;
}
pub struct SqlGenerationResult {
pub sql: String,
pub confidence: f64,
pub reasoning: Vec<String>, // Chain-of-thought steps
pub validation_passed: bool,
pub corrections_applied: Vec<String>, // Auto-corrections made
pub warnings: Vec<String>,
}

Schema Augmentation:

pub struct SchemaAugmenter {
// Enhance schema with semantic information
embedding_model: Arc<EmbeddingModel>,
value_profiler: Arc<ValueProfiler>,
}
pub struct AugmentedSchema {
pub base_schema: DatabaseSchema,
pub table_descriptions: HashMap<String, String>,
pub column_descriptions: HashMap<String, String>,
pub column_examples: HashMap<String, Vec<String>>, // Sample values
pub relationships: Vec<Relationship>,
pub business_terms: HashMap<String, Vec<String>>, // "revenue" -> ["sales_amount", "total_price"]
pub value_distributions: HashMap<String, ValueDistribution>,
}
impl SchemaAugmenter {
// Build augmented schema from database
pub async fn augment(&self, schema: &DatabaseSchema) -> Result<AugmentedSchema>;
// Add custom business vocabulary
pub async fn add_vocabulary(&self, term: &str, mappings: Vec<String>);
// Profile column values
pub async fn profile_columns(&self, table: &str, columns: Vec<String>) -> Result<()>;
}

Few-Shot Example Store:

pub struct ExampleStore {
// Store high-quality NL→SQL examples for few-shot prompting
examples: Arc<DashMap<String, Vec<Example>>>,
embedding_model: Arc<EmbeddingModel>,
vector_index: Arc<HnswIndex>, // For similarity search
}
pub struct Example {
pub id: String,
pub natural_language: String,
pub sql: String,
pub schema_context: String,
pub dialect: SqlDialect,
pub complexity: Complexity,
pub embedding: Vec<f32>,
}
pub enum Complexity {
Simple, // Single table, basic WHERE
Medium, // Joins, GROUP BY
Complex, // Subqueries, CTEs, window functions
}
impl ExampleStore {
// Find k most similar examples for few-shot prompting
pub async fn find_similar(&self, query: &str, k: usize) -> Result<Vec<Example>>;
// Add new example (can learn from successful queries)
pub async fn add_example(&self, example: Example) -> Result<()>;
// Load standard examples (BIRD dataset, Spider, WikiSQL)
pub async fn load_benchmark_examples(&self, dataset: &str) -> Result<usize>;
}

SQL Validator:

pub struct SqlValidator {
parser: Arc<SqlParser>,
schema_validator: Arc<SchemaValidator>,
}
pub struct ValidationResult {
pub valid: bool,
pub errors: Vec<ValidationError>,
pub warnings: Vec<String>,
pub suggestions: Vec<String>,
}
pub enum ValidationError {
SyntaxError { message: String, position: usize },
TableNotFound { table: String },
ColumnNotFound { column: String, table: String },
TypeMismatch { expected: String, found: String },
AmbiguousColumn { column: String, tables: Vec<String> },
}
impl SqlValidator {
// Validate SQL against schema
pub async fn validate(&self, sql: &str, schema: &AugmentedSchema) -> Result<ValidationResult>;
// Suggest corrections
pub async fn suggest_corrections(&self, sql: &str, errors: &[ValidationError]) -> Result<Vec<String>>;
}

3.1.5 Query Explanation Engine

Responsibility: Generate natural language explanations of SQL queries and execution plans

pub struct ExplanationEngine {
plan_explainer: Arc<PlanExplainer>,
performance_predictor: Arc<PerformancePredictor>,
optimizer_suggester: Arc<OptimizerSuggester>,
nlg_model: Arc<NaturalLanguageGenerator>,
}
impl ExplanationEngine {
pub async fn explain_query(
&self,
sql: &str,
execution_plan: &ExecutionPlan,
results: &QueryResults,
) -> Result<Explanation>;
}
pub struct Explanation {
pub summary: String, // One-sentence summary
pub detailed_steps: Vec<String>, // Step-by-step explanation
pub performance_analysis: PerformanceAnalysis,
pub optimization_suggestions: Vec<OptimizationSuggestion>,
pub visual_plan: Option<String>, // ASCII art or mermaid diagram
}
pub struct PerformanceAnalysis {
pub estimated_cost: f64,
pub estimated_rows: usize,
pub index_usage: Vec<String>,
pub bottlenecks: Vec<String>,
pub parallel_potential: Option<String>,
}
pub struct OptimizationSuggestion {
pub suggestion: String,
pub impact: ImpactLevel,
pub effort: EffortLevel,
pub sql_rewrite: Option<String>, // Alternative SQL
}
pub enum ImpactLevel {
High, // >50% improvement
Medium, // 10-50% improvement
Low, // <10% improvement
}
pub enum EffortLevel {
Easy, // Auto-applicable
Medium, // Requires index creation
Hard, // Requires schema changes
}

Plan Explainer:

pub struct PlanExplainer {
// Convert query plans to natural language
}
impl PlanExplainer {
// Generate natural language explanation from execution plan
pub async fn explain_plan(&self, plan: &ExecutionPlan) -> Result<Vec<String>>;
// Generate visual representation (ASCII tree, mermaid)
pub fn visualize_plan(&self, plan: &ExecutionPlan, format: VisualizationFormat) -> String;
}
pub enum VisualizationFormat {
AsciiTree,
Mermaid,
Graphviz,
}

3.1.6 Model Router

Responsibility: Route requests to appropriate LLM models (local or cloud)

pub struct ModelRouter {
models: HashMap<String, Arc<dyn LanguageModel>>,
routing_strategy: RoutingStrategy,
fallback_chain: Vec<String>,
}
pub enum RoutingStrategy {
Primary, // Always use primary model
CostOptimized, // Use cheapest model that meets requirements
LatencyOptimized, // Use fastest model
AccuracyOptimized, // Use most accurate model
Adaptive, // Choose based on query complexity
}
pub trait LanguageModel: Send + Sync {
async fn generate(&self, prompt: &str, config: &GenerationConfig) -> Result<String>;
fn model_name(&self) -> &str;
fn max_tokens(&self) -> usize;
fn cost_per_1k_tokens(&self) -> f64;
}
// Implementations
pub struct OpenAIModel { /* GPT-4, GPT-3.5-turbo */ }
pub struct AnthropicModel { /* Claude Sonnet, Opus */ }
pub struct CohereModel { /* Command, Command-Light */ }
pub struct OllamaModel { /* Local models: Llama, Mistral, CodeLlama */ }
pub struct ONNXModel { /* Local ONNX runtime models */ }
impl ModelRouter {
pub async fn route(&self, prompt: &str, requirements: &ModelRequirements) -> Result<Arc<dyn LanguageModel>>;
}
pub struct ModelRequirements {
pub max_latency_ms: Option<u64>,
pub min_accuracy: Option<f64>,
pub max_cost_per_query: Option<f64>,
pub require_local: bool, // Must use local model (privacy)
}

3.2 Supporting Components

3.2.1 Schema Cache

pub struct SchemaCache {
cache: Arc<DashMap<String, Arc<AugmentedSchema>>>,
ttl: Duration,
}
impl SchemaCache {
pub async fn get_or_load(&self, database: &str) -> Result<Arc<AugmentedSchema>>;
pub async fn invalidate(&self, database: &str);
pub async fn refresh_all(&self) -> Result<()>;
}

3.2.2 Query Cache

pub struct QueryCache {
// Cache NL→SQL mappings
cache: Arc<DashMap<String, CachedQuery>>,
embedding_model: Arc<EmbeddingModel>,
similarity_threshold: f64, // Default: 0.85
}
pub struct CachedQuery {
pub nl_query: String,
pub sql: String,
pub schema_fingerprint: String, // Invalidate if schema changes
pub hit_count: AtomicU64,
pub last_used: DateTime<Utc>,
pub average_latency_ms: f64,
}
impl QueryCache {
// Try to find semantically similar cached query
pub async fn find_similar(&self, query: &str, threshold: f64) -> Option<CachedQuery>;
// Add successful query to cache
pub async fn insert(&self, nl_query: &str, sql: &str, schema_fingerprint: &str);
}

4. Data Flow Architecture

4.1 Query Processing Flow

User Query: "Show me top 10 customers by revenue in 2024"
┌─────────────────────────────────────────────────────────────┐
│ 1. Session Management │
│ - Load/create session │
│ - Load conversation context │
│ - Check rate limits │
└────────────────┬────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ 2. Context Resolution │
│ - Resolve references ("that", "it", "them") │
│ - Merge with previous context │
│ - Detect ambiguities │
└────────────────┬────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ 3. Query Cache Lookup (Semantic) │
│ - Embed query │
│ - Search for similar queries (>85% similarity) │
│ - If found: Return cached SQL + explain │
└────────────────┬────────────────────────────────────────────┘
│ (cache miss)
┌─────────────────────────────────────────────────────────────┐
│ 4. Schema Loading & Augmentation │
│ - Load schema from cache or database │
│ - Augment with descriptions, examples, relationships │
│ - Filter relevant tables/columns (reduce token count) │
└────────────────┬────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ 5. Few-Shot Example Retrieval │
│ - Find k=5 similar examples from example store │
│ - Match on complexity, tables, operations │
└────────────────┬────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ 6. Prompt Construction │
│ - System prompt (role, constraints) │
│ - Schema context (relevant tables/columns) │
│ - Few-shot examples (5 similar NL→SQL pairs) │
│ - Conversation history (last 3-5 turns) │
│ - User query │
└────────────────┬────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ 7. LLM Inference │
│ - Route to appropriate model (local/cloud) │
│ - Generate SQL with chain-of-thought reasoning │
│ - Parse response │
└────────────────┬────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ 8. SQL Validation │
│ - Parse SQL (syntax check) │
│ - Validate against schema (semantic check) │
│ - If invalid: Self-correct via LLM feedback loop │
│ - Max 3 correction attempts │
└────────────────┬────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ 9. Query Optimization │
│ - Analyze query plan │
│ - Apply rule-based optimizations │
│ - Generate optimization suggestions │
└────────────────┬────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ 10. Query Execution (HeliosDB Compute) │
│ - Execute via heliosdb-compute │
│ - Collect execution stats │
│ - Format results │
└────────────────┬────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ 11. Explanation Generation │
│ - Explain query in natural language │
│ - Explain execution plan │
│ - Provide performance analysis │
│ - Suggest optimizations │
└────────────────┬────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ 12. Response Assembly │
│ - SQL + Results + Explanation + Suggestions │
│ - Add to conversation context │
│ - Cache successful query │
│ - Return to user │
└─────────────────────────────────────────────────────────────┘
Output:
SQL: SELECT c.customer_name, SUM(o.amount) as revenue
FROM customers c JOIN orders o ON c.id = o.customer_id
WHERE YEAR(o.order_date) = 2024
GROUP BY c.id, c.customer_name
ORDER BY revenue DESC
LIMIT 10
Explanation: "This query finds the top 10 customers by total revenue in 2024.
It joins the customers and orders tables, filters for 2024 orders,
groups by customer, sums the order amounts, and returns the top 10."
Performance: "Estimated cost: 1250.3, rows: 10
Uses index: idx_orders_customer_date
Bottleneck: Full table scan on customers (recommend index on id)"
Suggestions:
- "Add index on customers(id) for 50% faster joins (High impact, Easy)"
- "Consider materialized view for monthly revenue (Medium impact, Medium effort)"

4.2 Multi-Turn Conversation Flow

Turn 1: "Show me sales by region"
→ SQL: SELECT region, SUM(amount) FROM sales GROUP BY region
→ Context: Focus on "sales table", "region column"
Turn 2: "Which region had the highest growth?"
→ Reference resolution: "region" refers to previous query
→ Need time comparison: Clarify "growth compared to what period?"
→ Response: Clarifying question
Turn 3: "Compared to last year"
→ SQL: WITH current AS (...), previous AS (...)
SELECT ... FROM current JOIN previous ...
→ Context: Time range comparison, "year-over-year growth"
Turn 4: "Break that down by product"
→ Reference: "that" = previous query (regional growth)
→ SQL: Add product dimension to previous query
→ Context: Multi-dimensional analysis (region × product × time)

5. Integration Points

5.1 HeliosDB Compute Integration

// heliosdb-compute provides query execution
use heliosdb_compute::{QueryExecutor, ExecutionPlan, QueryResult};
impl ConversationalBiEngine {
async fn execute_sql(&self, sql: &str, database: &str) -> Result<QueryResult> {
let executor = self.compute_engine.create_executor(database).await?;
// Get execution plan for explanation
let plan = executor.explain(sql).await?;
// Execute query
let result = executor.execute(sql).await?;
Ok((plan, result))
}
}

5.2 Schema Introspection

// Need heliosdb-schema crate (create if doesn't exist)
use heliosdb_schema::{SchemaIntrospector, DatabaseSchema};
pub struct SchemaIntrospector {
metadata_service: Arc<MetadataService>,
}
impl SchemaIntrospector {
pub async fn get_schema(&self, database: &str) -> Result<DatabaseSchema> {
// Query information_schema or system catalogs
let tables = self.get_tables(database).await?;
let columns = self.get_columns(database).await?;
let relationships = self.infer_relationships(database).await?;
let indexes = self.get_indexes(database).await?;
Ok(DatabaseSchema {
tables,
columns,
relationships,
indexes,
})
}
}
pub struct DatabaseSchema {
pub tables: Vec<Table>,
pub columns: Vec<Column>,
pub relationships: Vec<Relationship>,
pub indexes: Vec<Index>,
}

5.3 ML Model Serving Integration

// heliosdb-ml provides model serving
use heliosdb_ml::{MlEngine, ModelConfig};
impl Nl2SqlEngine {
async fn load_local_model(&self, model_path: &str) -> Result<Arc<ONNXModel>> {
let ml_engine = MlEngine::new().await?;
let config = ModelConfig {
name: "nl2sql_local".to_string(),
path: model_path.to_string(),
format: ModelFormat::Onnx,
input_columns: vec!["input_text".to_string()],
output_column: "sql_output".to_string(),
version: "1.0".to_string(),
metadata: Default::default(),
};
ml_engine.register_model(config).await?;
Ok(Arc::new(ONNXModel::new(ml_engine)))
}
}

5.4 Monitoring Integration

// heliosdb-monitoring for observability (use heliosdb-common metrics)
use heliosdb_common::metrics::{MetricsCollector, Histogram, Counter};
pub struct ConversationalMetrics {
query_latency: Histogram,
cache_hit_rate: Counter,
validation_failures: Counter,
correction_attempts: Counter,
model_inference_latency: Histogram,
}
impl ConversationalBiEngine {
fn record_metrics(&self, response: &ConversationResponse) {
self.metrics.query_latency.observe(response.latency_ms);
self.metrics.cache_hit_rate.inc_if(response.from_cache);
// ... more metrics
}
}

6. Multi-Turn Context Management

6.1 Context Structure

pub struct ConversationContext {
// Conversation history (bounded deque)
pub turns: VecDeque<ConversationTurn>,
// Entity tracking
pub mentioned_tables: HashSet<String>,
pub mentioned_columns: HashMap<String, String>, // column -> table
pub mentioned_values: HashMap<String, Vec<String>>, // column -> values
// Current focus (what is the user looking at?)
pub current_tables: Vec<String>,
pub current_filters: Vec<Filter>,
pub current_aggregations: Vec<Aggregation>,
pub current_time_range: Option<TimeRange>,
// Reference tracking (for "it", "that", "them")
pub last_entity: Option<Entity>,
pub last_result_columns: Vec<String>,
// Intent tracking
pub conversation_goal: Option<ConversationGoal>,
}
pub enum ConversationGoal {
Exploration, // User is exploring data
Analysis, // User is performing specific analysis
Reporting, // User is building a report
Debugging, // User is investigating an issue
}

6.2 Reference Resolution

Pronoun Resolution:

impl ContextTracker {
pub async fn resolve_pronouns(&self, query: &str, context: &ConversationContext) -> String {
// "Show me that by product" → "Show me [regional sales] by product"
let resolved = query
.replace("that", &context.last_entity.as_ref().unwrap().to_string())
.replace("it", &context.last_entity.as_ref().unwrap().to_string())
.replace("them", &context.last_entity.as_ref().unwrap().to_string());
resolved
}
}

Implicit Context:

Turn 1: "Show sales in California"
Context: region = "California"
Turn 2: "What about New York?"
Implicit: Keep same query structure, change region filter
Resolved: "Show sales in New York"

6.3 Clarification Detection

pub struct ClarificationEngine {
ambiguity_detector: Arc<AmbiguityDetector>,
}
impl ClarificationEngine {
pub fn detect_ambiguities(&self, query: &str, context: &ConversationContext) -> Vec<Clarification> {
let mut clarifications = Vec::new();
// Time range ambiguity
if contains_time_reference(query) && context.current_time_range.is_none() {
clarifications.push(Clarification {
question: "Which time period are you interested in?".to_string(),
suggestions: vec!["This month", "Last quarter", "This year", "Custom range"],
});
}
// Column ambiguity (multiple columns with same name)
if let Some(ambiguous_columns) = self.find_ambiguous_columns(query, &context.mentioned_tables) {
clarifications.push(Clarification {
question: format!("Which '{}' did you mean?", ambiguous_columns[0]),
suggestions: ambiguous_columns.iter().map(|c| format!("{}.{}", c.table, c.column)).collect(),
});
}
clarifications
}
}

7. NL2SQL Engine Design

7.1 Prompt Engineering Strategy

System Prompt Template:

You are an expert SQL query generator for HeliosDB.
Your task is to convert natural language questions into accurate SQL queries.
CRITICAL RULES:
1. Only use tables and columns that exist in the provided schema
2. Always qualify column names with table aliases when joining tables
3. Use appropriate aggregate functions (SUM, AVG, COUNT, etc.)
4. Format dates using the database's date functions
5. Include appropriate WHERE clauses for filters
6. Use JOINs instead of subqueries when possible for performance
7. Add LIMIT clauses for large result sets unless explicitly asked for all data
8. Use proper GROUP BY when using aggregate functions
QUERY STRUCTURE:
- Start with a clear understanding of the question
- Identify the required tables and columns
- Determine necessary JOINs
- Apply filters (WHERE clause)
- Add aggregations if needed (GROUP BY, HAVING)
- Sort results appropriately (ORDER BY)
- Limit results if appropriate
IMPORTANT:
- If the question is ambiguous, generate the most likely interpretation and note the assumption
- If critical information is missing, indicate what clarification is needed
- Always validate your SQL against the schema before returning it
SQL DIALECT: {dialect} (PostgreSQL/MySQL/Oracle)

Few-Shot Examples:

Example 1:
Question: "What are the top 5 selling products last month?"
Schema: products(id, name), orders(id, product_id, quantity, order_date)
SQL:
SELECT p.name, SUM(o.quantity) as total_quantity
FROM products p
JOIN orders o ON p.id = o.product_id
WHERE o.order_date >= DATE_TRUNC('month', CURRENT_DATE - INTERVAL '1 month')
AND o.order_date < DATE_TRUNC('month', CURRENT_DATE)
GROUP BY p.id, p.name
ORDER BY total_quantity DESC
LIMIT 5;
Example 2:
Question: "Show me customers who spent more than $1000"
Schema: customers(id, name, email), orders(id, customer_id, total_amount)
SQL:
SELECT c.name, c.email, SUM(o.total_amount) as total_spent
FROM customers c
JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name, c.email
HAVING SUM(o.total_amount) > 1000
ORDER BY total_spent DESC;
... (3 more examples based on query similarity)

Context Integration:

CONVERSATION HISTORY:
Turn 1: User asked about "sales by region"
SQL: SELECT region, SUM(amount) FROM sales GROUP BY region
Turn 2: User asked "which had the highest?"
Interpretation: Referring to regions from previous query
CURRENT CONTEXT:
- Focused tables: sales
- Focused columns: region, amount
- Time range: Not specified (needs clarification OR assume current year)
- Last entity: "region"
CURRENT QUESTION: "{user_query}"
TASK: Generate SQL query considering the conversation context.
If referring to previous results, build upon the previous query.

7.2 Self-Correction Loop

pub struct SelfCorrectionEngine {
max_attempts: usize, // Default: 3
}
impl SelfCorrectionEngine {
pub async fn correct_sql(
&self,
sql: &str,
validation_errors: &[ValidationError],
schema: &AugmentedSchema,
model: &Arc<dyn LanguageModel>,
) -> Result<String> {
let mut current_sql = sql.to_string();
for attempt in 1..=self.max_attempts {
let validation = self.validator.validate(&current_sql, schema).await?;
if validation.valid {
return Ok(current_sql);
}
// Build correction prompt
let correction_prompt = format!(
"The following SQL query has errors:\n\n{}\n\nErrors:\n{}\n\n\
Schema:\n{}\n\n\
Please correct the SQL query to fix these errors.",
current_sql,
validation.errors.iter().map(|e| format!("- {}", e)).collect::<Vec<_>>().join("\n"),
schema.to_prompt_string()
);
current_sql = model.generate(&correction_prompt, &GenerationConfig::default()).await?;
tracing::info!("Self-correction attempt {}: {}", attempt, current_sql);
}
Err(anyhow::anyhow!("Failed to correct SQL after {} attempts", self.max_attempts))
}
}

7.3 Complex Query Handling

Subqueries and CTEs:

Question: "Show products with above-average prices in each category"
Generated SQL:
WITH category_avg AS (
SELECT category_id, AVG(price) as avg_price
FROM products
GROUP BY category_id
)
SELECT p.name, p.price, c.name as category, ca.avg_price
FROM products p
JOIN categories c ON p.category_id = c.id
JOIN category_avg ca ON p.category_id = ca.category_id
WHERE p.price > ca.avg_price
ORDER BY c.name, p.price DESC;

Window Functions:

Question: "Rank employees by salary within each department"
Generated SQL:
SELECT
e.name,
d.name as department,
e.salary,
RANK() OVER (PARTITION BY e.department_id ORDER BY e.salary DESC) as rank
FROM employees e
JOIN departments d ON e.department_id = d.id
ORDER BY d.name, rank;

8. Query Explanation System

8.1 Explanation Templates

Simple Query Explanation:

SQL: SELECT * FROM customers WHERE country = 'USA'
Explanation:
"This query retrieves all customer records from the United States.
Step-by-step:
1. Scan the customers table
2. Filter rows where country equals 'USA'
3. Return all columns for matching rows
Performance:
- Estimated rows: 1,250 (out of 10,000 total customers)
- Cost: 125.3
- Index used: idx_customers_country (excellent!)
Suggestions:
- Consider adding specific columns instead of SELECT * to reduce data transfer
Example: SELECT name, email, city FROM customers WHERE country = 'USA'"

Complex Query Explanation:

SQL: WITH monthly_revenue AS (
SELECT DATE_TRUNC('month', order_date) as month, SUM(amount) as revenue
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY 1
)
SELECT month, revenue, revenue - LAG(revenue) OVER (ORDER BY month) as growth
FROM monthly_revenue
ORDER BY month;
Explanation:
"This query calculates monthly revenue for 2024 and shows month-over-month growth.
Step-by-step:
1. Create a temporary result set (CTE) called 'monthly_revenue':
- Filter orders from 2024 onwards
- Group by month
- Sum the order amounts for each month
2. Calculate month-over-month growth:
- Use LAG() window function to get previous month's revenue
- Subtract previous month from current month
3. Sort results by month chronologically
Performance:
- Estimated cost: 2,450.8
- Estimated rows: 12 (one per month)
- Full table scan on orders (200,000 rows)
Bottleneck:
- The date filter on orders table requires a full scan
Suggestions:
1. Add index on order_date for 85% faster execution (High impact, Easy)
SQL: CREATE INDEX idx_orders_date ON orders(order_date);
2. Consider partitioning orders table by month for even better performance
(High impact, Medium effort)"

8.2 Visual Query Plan

ASCII Tree Format:

└─ Sort (ORDER BY month)
└─ Window Aggregate (LAG revenue)
└─ CTE Scan: monthly_revenue
└─ Aggregate (GROUP BY month, SUM amount)
└─ Filter (order_date >= '2024-01-01')
└─ Seq Scan: orders
Cost: 2450.8, Rows: 200000

Mermaid Diagram (for web UI):

graph TD
A[Seq Scan: orders] -->|200,000 rows| B[Filter: date >= 2024]
B -->|50,000 rows| C[Group By: month]
C -->|12 rows| D[Window: LAG]
D -->|12 rows| E[Sort: month]

9. Model Architecture

9.1 Supported Models

ModelTypeCost/1M tokensLatencyAccuracyUse Case
GPT-4Cloud$303-5s95%Production (high accuracy)
GPT-3.5-turboCloud$21-2s85%Development/testing
Claude 3 SonnetCloud$152-3s93%Production (balanced)
Claude 3 HaikuCloud$1.250.5-1s82%High-throughput
Cohere CommandCloud$152-3s88%Alternative option
Llama 3 70B (Ollama)LocalFree5-10s82%Privacy-sensitive
CodeLlama 34BLocalFree3-5s85%SQL-focused
Custom ONNXLocalFree1-2s75-85%Fine-tuned domain

9.2 Model Selection Algorithm

impl ModelRouter {
pub async fn select_model(&self, requirements: &ModelRequirements, query_complexity: Complexity) -> Arc<dyn LanguageModel> {
// Priority 1: Privacy requirements
if requirements.require_local {
return self.select_best_local_model(query_complexity);
}
// Priority 2: Latency requirements
if let Some(max_latency) = requirements.max_latency_ms {
if max_latency < 1000 {
return self.models.get("claude-haiku").unwrap().clone();
}
}
// Priority 3: Cost optimization
if let Some(max_cost) = requirements.max_cost_per_query {
return self.select_cheapest_model(max_cost, query_complexity);
}
// Priority 4: Accuracy optimization
match query_complexity {
Complexity::Simple => self.models.get("gpt-3.5-turbo").unwrap().clone(),
Complexity::Medium => self.models.get("claude-sonnet").unwrap().clone(),
Complexity::Complex => self.models.get("gpt-4").unwrap().clone(),
}
}
}

9.3 Fine-Tuning Strategy

Domain Adaptation:

pub struct FineTuningManager {
training_data: Arc<DashMap<String, Vec<TrainingExample>>>,
}
pub struct TrainingExample {
pub schema: String,
pub nl_query: String,
pub sql: String,
pub validated: bool, // Human-validated correct query
}
impl FineTuningManager {
// Collect successful queries for fine-tuning
pub async fn collect_example(&self, schema: &str, nl: &str, sql: &str) {
let example = TrainingExample {
schema: schema.to_string(),
nl_query: nl.to_string(),
sql: sql.to_string(),
validated: false,
};
self.training_data.entry(schema.to_string())
.or_insert_with(Vec::new)
.push(example);
}
// Export for fine-tuning
pub async fn export_training_data(&self, format: TrainingFormat) -> Result<String> {
// Export to JSON, JSONL, or other formats for fine-tuning
// Can be used with OpenAI fine-tuning API, or local training
}
}

10. Performance Optimization

10.1 Caching Strategy

Multi-Level Cache:

L1: In-Memory LRU Cache (hot queries)
- Size: 10,000 queries
- TTL: 1 hour
- Hit rate: 40-50%
L2: Redis Cache (warm queries)
- Size: 100,000 queries
- TTL: 24 hours
- Hit rate: 30-40%
L3: Semantic Similarity Cache
- Embedding-based retrieval
- Threshold: 0.85 cosine similarity
- Hit rate: 20-30%
Total cache hit rate: 90-95%

10.2 Latency Optimization

Parallel Processing:

impl ConversationalBiEngine {
pub async fn process_query_parallel(&self, session_id: SessionId, query: &str) -> Result<ConversationResponse> {
// Execute in parallel:
let (schema_future, examples_future, context_future) = tokio::join!(
self.schema_cache.get_or_load(&session.database),
self.example_store.find_similar(query, 5),
self.context_tracker.build_context_prompt(&session.context),
);
let schema = schema_future?;
let examples = examples_future?;
let context = context_future?;
// Build prompt (fast, <10ms)
let prompt = self.build_prompt(query, &schema, &examples, &context);
// LLM inference (slowest, 1-5s)
let sql = self.model_router.route(&prompt, &requirements).await?.generate(&prompt).await?;
// Validate & execute in parallel
let (validation_future, plan_future) = tokio::join!(
self.validator.validate(&sql, &schema),
self.compute_engine.explain(&sql),
);
// ...
}
}

Token Optimization:

impl SchemaAugmenter {
// Reduce schema to only relevant tables/columns
pub fn filter_relevant_schema(&self, schema: &AugmentedSchema, query: &str) -> AugmentedSchema {
let mentioned_tables = self.extract_likely_tables(query, schema);
let filtered_tables = schema.tables.iter()
.filter(|t| mentioned_tables.contains(&t.name))
.cloned()
.collect();
// Reduces token count by 70-90%, improves latency 2-3x
AugmentedSchema {
tables: filtered_tables,
..schema.clone()
}
}
}

10.3 Throughput Optimization

Connection Pooling:

pub struct SessionPool {
pool: Pool<SessionConnection>,
max_connections: usize, // Default: 1000
}
// Reuse LLM connections, reduce overhead

Batching:

// Batch multiple queries to same model
pub async fn process_batch(&self, queries: Vec<(SessionId, String)>) -> Result<Vec<ConversationResponse>> {
// Group by model
// Send as batch to reduce network overhead
// 30-50% latency reduction for high-throughput scenarios
}

11. Testing Strategy

11.1 Accuracy Testing

Benchmark Datasets:

  1. BIRD (BIg Bench for LaRge Database Grounded Text-to-SQL Evaluation)

    • 12,751 unique questions
    • 95 databases
    • Complex queries (CTEs, window functions, nested subqueries)
    • Target: 95%+ execution accuracy
  2. Spider

    • 10,181 questions
    • 200 databases
    • Medium complexity
    • Target: 98%+ accuracy
  3. WikiSQL

    • 80,654 questions
    • Simple queries
    • Target: 99%+ accuracy

Accuracy Metrics:

pub struct AccuracyMetrics {
pub exact_match: f64, // SQL exactly matches gold standard
pub execution_match: f64, // Results match (more lenient)
pub valid_sql: f64, // SQL is syntactically valid
pub schema_valid: f64, // SQL is semantically valid for schema
}
impl AccuracyTester {
pub async fn evaluate_on_bird(&self) -> AccuracyMetrics {
let bird_dataset = self.load_bird_dataset()?;
let mut metrics = AccuracyMetrics::default();
for example in bird_dataset {
let generated_sql = self.engine.generate_sql(&example.question, &example.schema).await?;
if generated_sql == example.gold_sql {
metrics.exact_match += 1.0;
}
let generated_results = self.execute(generated_sql)?;
let gold_results = self.execute(example.gold_sql)?;
if generated_results == gold_results {
metrics.execution_match += 1.0;
}
}
metrics.normalize(bird_dataset.len());
metrics
}
}

11.2 Multi-Turn Testing

pub struct MultiTurnTestCase {
pub conversation: Vec<ConversationTurn>,
pub expected_sql: Vec<String>,
pub test_context_retention: bool,
pub test_reference_resolution: bool,
}
// Example test case
let test = MultiTurnTestCase {
conversation: vec![
ConversationTurn {
query: "Show me sales by region",
expected_sql: "SELECT region, SUM(amount) FROM sales GROUP BY region",
},
ConversationTurn {
query: "Which had the highest?",
expected_sql: "SELECT region, SUM(amount) as total FROM sales GROUP BY region ORDER BY total DESC LIMIT 1",
},
ConversationTurn {
query: "Show that by month",
expected_sql: "SELECT DATE_TRUNC('month', sale_date) as month, region, SUM(amount) FROM sales WHERE region = 'West' GROUP BY month, region",
},
],
test_context_retention: true,
test_reference_resolution: true,
};

11.3 Performance Testing

pub struct PerformanceBenchmark {
latency_p50: Duration,
latency_p95: Duration,
latency_p99: Duration,
throughput_qps: f64,
cache_hit_rate: f64,
}
impl PerformanceTester {
pub async fn benchmark(&self, concurrency: usize, duration: Duration) -> PerformanceBenchmark {
let mut latencies = Vec::new();
let start = Instant::now();
let mut total_queries = 0;
// Run concurrent queries
let tasks: Vec<_> = (0..concurrency)
.map(|_| {
tokio::spawn(async move {
while start.elapsed() < duration {
let query_start = Instant::now();
let _ = self.engine.process_query(session_id, random_query()).await;
latencies.push(query_start.elapsed());
total_queries += 1;
}
})
})
.collect();
// Wait for all tasks
for task in tasks {
task.await.unwrap();
}
// Calculate metrics
latencies.sort();
PerformanceBenchmark {
latency_p50: latencies[latencies.len() / 2],
latency_p95: latencies[latencies.len() * 95 / 100],
latency_p99: latencies[latencies.len() * 99 / 100],
throughput_qps: total_queries as f64 / duration.as_secs_f64(),
cache_hit_rate: self.query_cache.hit_rate(),
}
}
}

11.4 SQL Dialect Testing

pub struct DialectTestSuite {
dialects: Vec<SqlDialect>,
}
impl DialectTestSuite {
pub async fn test_all_dialects(&self) {
for dialect in &self.dialects {
let examples = self.load_dialect_specific_examples(dialect);
for example in examples {
let sql = self.engine.generate_sql(&example.query, dialect).await?;
// Validate SQL is valid for this dialect
assert!(self.validate_dialect_sql(&sql, dialect));
// Check dialect-specific features
match dialect {
SqlDialect::PostgreSQL => {
// Test RETURNING, CTEs, window functions
}
SqlDialect::MySQL => {
// Test LIMIT syntax, ON DUPLICATE KEY UPDATE
}
SqlDialect::Oracle => {
// Test ROWNUM, CONNECT BY, etc.
}
}
}
}
}
}

12. Implementation Roadmap

12.1 Phase 1: Foundation (3 weeks, $240K)

Week 1: Core Infrastructure

  • Create heliosdb-conversational-bi crate
  • Implement SessionManager with basic CRUD
  • Implement ConversationContext data structures
  • Implement SchemaCache with TTL
  • Deliverable: Basic session management working
  • Team: 1 Senior Engineer, 1 Mid-Level Engineer

Week 2: Schema Integration

  • Create heliosdb-schema crate (schema introspection)
  • Implement SchemaAugmenter (descriptions, examples)
  • Integrate with existing heliosdb-metadata
  • Build schema → prompt converter
  • Deliverable: Schema augmentation pipeline
  • Team: 1 Senior Engineer, 1 Mid-Level Engineer

Week 3: Model Integration

  • Implement ModelRouter with OpenAI, Anthropic, Cohere clients
  • Implement local model support (Ollama integration)
  • Build prompt template system
  • Implement basic NL→SQL generation (no context yet)
  • Deliverable: Single-turn NL2SQL working with 70%+ accuracy
  • Team: 1 Senior ML Engineer, 1 Mid-Level Engineer

12.2 Phase 2: NL2SQL Engine (4 weeks, $320K)

Week 4: Query Understanding

  • Implement QueryUnderstander (intent, entity extraction)
  • Build EntityExtractor using NER models
  • Implement ambiguity detection
  • Deliverable: Intent and entity extraction working
  • Team: 1 Senior ML Engineer, 1 Mid-Level Engineer

Week 5: SQL Generation

  • Implement SqlGenerator with chain-of-thought prompting
  • Build ExampleStore with vector similarity search
  • Load BIRD, Spider, WikiSQL examples
  • Implement few-shot example retrieval
  • Deliverable: Complex SQL generation (CTEs, window functions)
  • Team: 1 Senior Engineer, 1 ML Engineer

Week 6: Validation & Correction

  • Implement SqlValidator (syntax + semantic)
  • Build SelfCorrectionEngine with feedback loop
  • Implement error→correction prompt generation
  • Deliverable: Self-correcting SQL generation, 85%+ accuracy
  • Team: 1 Senior Engineer, 1 Mid-Level Engineer

Week 7: Multi-Dialect Support

  • Extend to PostgreSQL, MySQL, Oracle dialects
  • Implement dialect-specific transformations
  • Build dialect-specific validation rules
  • Deliverable: Multi-dialect SQL generation
  • Team: 1 Senior Engineer

12.3 Phase 3: Multi-Turn Context (3 weeks, $240K)

Week 8: Context Tracking

  • Implement ContextTracker with entity tracking
  • Build reference resolution (“it”, “that”, “them”)
  • Implement conversation history summarization
  • Deliverable: 5+ turn conversations working
  • Team: 1 Senior Engineer, 1 Mid-Level Engineer

Week 9: Clarification Engine

  • Implement ClarificationEngine (ambiguity detection)
  • Build clarifying question generation
  • Implement interactive clarification flow
  • Deliverable: System asks clarifying questions when needed
  • Team: 1 Senior Engineer

Week 10: Context Optimization

  • Optimize context window usage (token reduction)
  • Implement semantic compression
  • Build context pruning strategies
  • Deliverable: 10+ turn conversations within token budget
  • Team: 1 ML Engineer

12.4 Phase 4: Query Explanation (2 weeks, $160K)

Week 11: Plan Explanation

  • Implement ExplanationEngine
  • Build PlanExplainer (natural language from execution plan)
  • Implement visual plan generation (ASCII, Mermaid)
  • Deliverable: Natural language query explanations
  • Team: 1 Senior Engineer, 1 Mid-Level Engineer

Week 12: Optimization Suggestions

  • Implement OptimizerSuggester
  • Build index recommendation engine
  • Implement query rewrite suggestions
  • Deliverable: Actionable optimization suggestions
  • Team: 1 Senior Engineer (Database Optimization background)

12.5 Phase 5: Production Hardening & Testing (2.5 weeks, $240K)

Week 13: Performance Optimization

  • Implement query caching (semantic similarity)
  • Optimize prompt construction (parallel loading)
  • Implement batching for throughput
  • Deliverable: <2s p50 latency, >80% cache hit rate
  • Team: 1 Senior Engineer, 1 Performance Engineer

Week 14-15: Testing & Benchmarking

  • Run BIRD benchmark suite (target 95%+)
  • Run Spider benchmark suite (target 98%+)
  • Multi-turn conversation testing (100 test cases)
  • Performance benchmarking (latency, throughput)
  • Security testing (SQL injection prevention, access control)
  • Deliverable: All benchmarks passing, production-ready
  • Team: 2 QA Engineers, 1 Security Engineer

12.6 Timeline Summary

PhaseDurationCostDeliverables
Phase 1: Foundation3 weeks$240KSession mgmt, schema integration, basic NL2SQL
Phase 2: NL2SQL Engine4 weeks$320KAdvanced SQL generation, validation, 85%+ accuracy
Phase 3: Multi-Turn Context3 weeks$240K10+ turn conversations, reference resolution
Phase 4: Query Explanation2 weeks$160KNatural language explanations, optimization tips
Phase 5: Production Hardening2.5 weeks$240KPerformance tuning, benchmarking, 95%+ accuracy
TOTAL14.5 weeks (2.5 months)$1.2MProduction-ready Conversational BI

Buffer: 2 weeks, $200K (contingency) Total with buffer: 16.5 weeks, $1.4M


13. Success Metrics

13.1 Accuracy Metrics

MetricTargetMeasurement Method
BIRD Execution Accuracy95%+Run against full BIRD dataset (12,751 questions)
Spider Accuracy98%+Run against Spider dataset (10,181 questions)
WikiSQL Accuracy99%+Run against WikiSQL dataset (80,654 questions)
Exact Match Rate60%+Generated SQL exactly matches gold standard
Multi-Turn Accuracy90%+Correct SQL in 10+ turn conversations

13.2 Performance Metrics

MetricTargetMeasurement Method
Latency (p50)<2sMeasure query generation time
Latency (p99)<5sMeasure query generation time
Throughput100+ QPSConcurrent session testing
Cache Hit Rate80%+Query cache effectiveness
Availability99.9%Uptime monitoring

13.3 User Experience Metrics

MetricTargetMeasurement Method
Clarification Rate<20%How often system needs to ask clarifying questions
User Satisfaction4.5/5User survey (1-5 scale)
Adoption Rate60%+% of users who try it and continue using
Time to Insight3x fasterCompared to manual SQL writing

13.4 Business Metrics

MetricTargetImpact
ARR$60MRevenue from Conversational BI feature
Market Share#1 in NL2SQLBest accuracy in the market (95%+ vs SOTA 68-80%)
Patent Value$18M-$28MMulti-turn conversation patent
Customer Logos500+Enterprise customers using feature

14. Risk Mitigation

14.1 Technical Risks

RiskProbabilityImpactMitigation
Accuracy below 95%MediumHighUse ensemble models, extensive fine-tuning, human-in-the-loop for edge cases
Latency exceeds 2sMediumMediumAggressive caching, model optimization, consider smaller models for simple queries
Model hallucinationMediumHighStrict validation, self-correction loop, confidence thresholds
Schema driftLowMediumSchema versioning, cache invalidation on schema changes

14.2 Operational Risks

RiskProbabilityImpactMitigation
LLM API costsMediumMediumImplement cost caps, local model fallback, query caching
LLM API downtimeMediumHighMulti-provider strategy, local model fallback
Scalability issuesLowHighHorizontal scaling, connection pooling, distributed caching
Security vulnerabilitiesLowHighSQL injection prevention, parameterized queries, access control

14.3 Business Risks

RiskProbabilityImpactMitigation
Low adoptionLowHighUser training, documentation, gradual rollout, collect feedback
Competitor launches similarMediumMediumPatent filing, first-mover advantage, continuous improvement
Privacy concernsMediumMediumLocal model option, transparent data handling, compliance certifications

15. Future Enhancements (Post-v7.0)

15.1 Advanced Features

  1. Multi-Modal Queries

    • Support screenshots of charts/tables as input
    • “Recreate this chart” → generates SQL + visualization config
  2. Proactive Insights

    • System suggests analyses based on data patterns
    • “I noticed sales dropped 15% in APAC last week. Want to investigate?”
  3. Automated Report Generation

    • “Create a weekly sales report” → generates scheduled queries
  4. Collaborative Features

    • Share conversation sessions across team
    • Comment on queries, suggest improvements
  5. Integration with BI Tools

    • Export to Tableau, Power BI, Looker
    • Generate dashboard from conversation

15.2 Research Directions

  1. Reinforcement Learning from Human Feedback (RLHF)

    • Learn from user corrections
    • Continuously improve accuracy
  2. Graph-Based Query Planning

    • Use graph neural networks for join optimization
    • Predict optimal execution plans
  3. Federated NL2SQL

    • Query across multiple databases
    • Handle heterogeneous schemas

Appendix A: API Examples

A.1 REST API

POST /api/v1/conversational-bi/sessions
Content-Type: application/json
{
"user_id": "user123",
"database": "sales_db"
}
Response:
{
"session_id": "sess_abc123",
"created_at": "2025-11-09T10:00:00Z",
"expires_at": "2025-11-09T10:30:00Z"
}
POST /api/v1/conversational-bi/query
Content-Type: application/json
{
"session_id": "sess_abc123",
"query": "Show me top 10 customers by revenue in 2024"
}
Response:
{
"turn_id": 1,
"sql": "SELECT c.customer_name, SUM(o.amount) as revenue FROM customers c JOIN orders o ON c.id = o.customer_id WHERE YEAR(o.order_date) = 2024 GROUP BY c.id, c.customer_name ORDER BY revenue DESC LIMIT 10",
"explanation": "This query finds the top 10 customers by total revenue in 2024...",
"results": {
"columns": ["customer_name", "revenue"],
"rows": [
["Acme Corp", 1500000],
["TechStart Inc", 1200000],
...
]
},
"confidence": 0.95,
"suggestions": [
"Add index on customers(id) for 50% faster joins"
],
"latency_ms": 1850
}

A.2 SQL Interface

-- Start conversation
SELECT conversational_bi.start_session('sales_db');
-- Returns: sess_abc123
-- Ask question
SELECT * FROM conversational_bi.ask(
'sess_abc123',
'Show me top 10 customers by revenue in 2024'
);
-- Returns: Generated SQL, explanation, results
-- Get conversation history
SELECT * FROM conversational_bi.history('sess_abc123');

A.3 Python SDK

from heliosdb import ConversationalBI
# Initialize
bi = ConversationalBI(connection_string="postgresql://localhost/sales_db")
# Start session
session = bi.start_session(user_id="user123")
# Ask questions
response = session.ask("Show me top 10 customers by revenue in 2024")
print(response.sql)
print(response.explanation)
print(response.results.to_dataframe())
# Follow-up
response = session.ask("Which region had the highest?")
print(response.sql) # Uses context from previous query
# Get optimization suggestions
for suggestion in response.suggestions:
print(f"- {suggestion.description} (Impact: {suggestion.impact})")

Appendix B: Patent Analysis

B.1 Patentable Innovations

1. Multi-Turn Context-Aware NL2SQL System

  • Claim: A method for maintaining conversation state across multiple natural language queries to a database, comprising:

    • Tracking entities mentioned across conversation turns
    • Resolving pronominal references using entity history
    • Building context-aware prompts that incorporate previous query results
    • Detecting when clarification is needed based on context ambiguity
  • Confidence: 85%

  • Prior Art: Limited systems handle 10+ turn conversations with full context retention

  • Estimated Value: $18M-$28M

2. Self-Correcting SQL Generation via LLM Feedback

  • Claim: A system for automatically correcting invalid SQL queries using iterative LLM refinement, comprising:

    • Validating generated SQL against database schema
    • Constructing correction prompts from validation errors
    • Iteratively refining SQL through multiple LLM calls
    • Terminating after N attempts or successful validation
  • Confidence: 70%

  • Prior Art: Some self-correction systems exist, but not specifically for SQL

  • Estimated Value: $8M-$12M

3. Schema-Augmented Few-Shot Learning for NL2SQL

  • Claim: A method for improving NL2SQL accuracy using augmented database schemas, comprising:

    • Enhancing schema with column descriptions, examples, and relationships
    • Selecting few-shot examples based on semantic similarity
    • Filtering schema to relevant tables/columns to reduce token count
    • Building prompts that combine schema, examples, and conversation context
  • Confidence: 75%

  • Prior Art: Few-shot NL2SQL exists, but schema augmentation is novel

  • Estimated Value: $10M-$15M

B.2 Filing Recommendation

Priority: P0 (file immediately) Justification: Core differentiators for Conversational BI feature Filing Strategy: US provisional → PCT → US/EU/China non-provisional


Appendix C: Competitive Analysis

FeatureHeliosDBTableau Ask DataMicrosoft CopilotThoughtSpotMetabase
Multi-Turn Context10+ turns❌ Single turn⚠ 3-5 turns5+ turns❌ Single turn
Accuracy (BIRD)95%+❌ Not tested⚠ ~75%⚠ ~80%❌ Not tested
Complex QueriesCTEs, window functions⚠ LimitedFull SQLFull SQL⚠ Limited
ExplanationNL + Visual⚠ BasicDetailedDetailed❌ None
Local Model SupportYes (Ollama)❌ Cloud only❌ Cloud only❌ Cloud only❌ Cloud only
Multi-Dialect3+ dialects⚠ LimitedSQL Server⚠ Limited10+ dialects
PricingIncluded$70/user/month$30/user/month$95/user/monthFree (OSS)

Competitive Advantages:

  1. Best-in-class accuracy (95%+ vs 75-80%)
  2. Deepest multi-turn context (10+ turns vs 3-5)
  3. Local model support (privacy-sensitive use cases)
  4. Integrated with database (no separate BI tool needed)
  5. Open source option via Metabase integration

Document Status: Complete - Ready for Implementation Next Steps:

  1. Review with engineering leadership
  2. File provisional patent applications (3 innovations)
  3. Update Series A materials with Conversational BI
  4. Begin Phase 1 implementation (Week 1: Core Infrastructure)

Total Investment: $1.0M-$1.4M Timeline: 2.5-3 months Expected ROI: $60M ARR, 43x-60x return