Conversational BI Architecture Design
Conversational BI Architecture Design
HeliosDB v7.0 World-First Innovation
Document Version: 1.0 Created: November 9, 2025 Status: Architecture Design - Ready for Implementation Investment: $1M over 2.5 months ARR Impact: $60M Target Accuracy: 95%+ on BIRD dataset (vs. current SOTA 68-80%)
Executive Summary
This document defines the complete architecture for HeliosDB’s Conversational BI system - a natural language to SQL interface with multi-turn context management, 95%+ accuracy, and comprehensive query explanation capabilities.
Key Differentiators:
- Multi-Turn Context: 10+ turn conversation memory with state management
- Best-in-Class Accuracy: 95%+ on BIRD dataset (vs. SOTA 68-80%)
- Deep Explanation: Natural language query plans with optimization suggestions
- Production-Ready: <2s latency, scalable, integrated with existing HeliosDB components
- Flexible Model Support: Local (Ollama, ONNX) and cloud (OpenAI, Anthropic, Cohere)
Table of Contents
- System Overview
- Architecture Principles
- Component Design
- Data Flow Architecture
- Integration Points
- Multi-Turn Context Management
- NL2SQL Engine Design
- Query Explanation System
- Model Architecture
- Performance Optimization
- Testing Strategy
- Implementation Roadmap
- Success Metrics
- Risk Mitigation
1. System Overview
1.1 Vision
Enable business users to interact with HeliosDB using natural language, eliminating the need for SQL expertise while maintaining the power and precision of structured queries.
1.2 Core Capabilities
┌─────────────────────────────────────────────────────────────────┐│ Conversational BI System │├─────────────────────────────────────────────────────────────────┤│ ││ Input: "Show me revenue trends for Q4 2024 by region" ││ "Which region had the highest growth?" ││ "Break that down by product category" ││ ││ ↓ Multi-Turn Context Engine ││ ↓ Schema-Aware NL2SQL ││ ↓ Query Validation & Optimization ││ ↓ Execution & Result Formatting ││ ↓ Natural Language Explanation ││ ││ Output: SQL Query + Results + Explanation + Optimization Tips ││ │└─────────────────────────────────────────────────────────────────┘1.3 Architecture Layers
┌──────────────────────────────────────────────────────────────────┐│ User Interface Layer ││ (SQL CLI, Web UI, GraphQL API, REST API, Protocol Handlers) │└────────────────────────┬─────────────────────────────────────────┘ │┌────────────────────────▼─────────────────────────────────────────┐│ Conversational BI Engine ││ ┌──────────────┐ ┌────────────┐ ┌─────────────────────────┐ ││ │ Session │ │ Context │ │ Clarification │ ││ │ Manager │ │ Tracker │ │ Engine │ ││ └──────────────┘ └────────────┘ └─────────────────────────┘ │└────────────────────────┬─────────────────────────────────────────┘ │┌────────────────────────▼─────────────────────────────────────────┐│ NL2SQL Engine ││ ┌──────────────┐ ┌────────────┐ ┌─────────────────────────┐ ││ │ Schema │ │ Query │ │ Validation & │ ││ │ Augmenter │ │ Generator │ │ Correction │ ││ └──────────────┘ └────────────┘ └─────────────────────────┘ │└────────────────────────┬─────────────────────────────────────────┘ │┌────────────────────────▼─────────────────────────────────────────┐│ Query Explanation Layer ││ ┌──────────────┐ ┌────────────┐ ┌─────────────────────────┐ ││ │ Plan │ │ Performance│ │ Optimization │ ││ │ Explainer │ │ Predictor │ │ Suggester │ ││ └──────────────┘ └────────────┘ └─────────────────────────┘ │└────────────────────────┬─────────────────────────────────────────┘ │┌────────────────────────▼─────────────────────────────────────────┐│ HeliosDB Core Integration ││ ┌──────────────┐ ┌────────────┐ ┌─────────────────────────┐ ││ │ heliosdb- │ │ heliosdb- │ │ heliosdb-monitoring │ ││ │ compute │ │ schema │ │ │ ││ └──────────────┘ └────────────┘ └─────────────────────────┘ │└──────────────────────────────────────────────────────────────────┘2. Architecture Principles
2.1 Design Principles
- Accuracy First: Prioritize correctness over speed (within latency budget)
- Context Awareness: Maintain full conversation history and leverage it intelligently
- Schema-Driven: Use database schema as ground truth for query generation
- Explainability: Every SQL query comes with a clear explanation
- Incremental Refinement: Support iterative query improvement through clarifications
- Model Agnostic: Support both local and cloud LLM models
- Production Hardened: Robust error handling, caching, monitoring
2.2 Non-Functional Requirements
| Requirement | Target | Measurement |
|---|---|---|
| Accuracy | 95%+ | BIRD dataset execution accuracy |
| Latency | <2s p50, <5s p99 | Query generation time |
| Context Length | 10+ turns | Conversation depth |
| Throughput | 100+ queries/sec | Concurrent conversations |
| Availability | 99.9% | Uptime SLA |
| Cache Hit Rate | >80% | Repeated/similar queries |
2.3 Security & Privacy
- Data Privacy: No query data sent to cloud unless explicitly configured
- Schema Protection: Schema metadata encrypted in transit/rest
- Access Control: Respect existing HeliosDB RBAC/ABAC policies
- Audit Trail: All NL queries logged with user context
- PII Detection: Automatic detection and masking of sensitive data in explanations
3. Component Design
3.1 Core Components
3.1.1 Conversational BI Engine (heliosdb-conversational-bi)
Responsibility: Orchestrate the entire NL2SQL pipeline
Key Modules:
pub struct ConversationalBiEngine { session_manager: Arc<SessionManager>, context_tracker: Arc<ContextTracker>, nl2sql_engine: Arc<Nl2SqlEngine>, explanation_engine: Arc<ExplanationEngine>, schema_cache: Arc<SchemaCache>, query_cache: Arc<QueryCache>, model_router: Arc<ModelRouter>, config: ConversationalConfig,}
pub struct ConversationalConfig { pub max_turns: usize, // Default: 20 pub context_window_tokens: usize, // Default: 8000 pub primary_model: ModelConfig, // Primary LLM pub fallback_model: Option<ModelConfig>, // Fallback if primary fails pub enable_query_cache: bool, // Default: true pub enable_schema_cache: bool, // Default: true pub max_concurrent_sessions: usize, // Default: 1000 pub session_timeout_minutes: u64, // Default: 30}API Surface:
impl ConversationalBiEngine { // Start a new conversation pub async fn create_session( &self, user_id: &str, database: &str, ) -> Result<SessionId>;
// Process natural language query pub async fn process_query( &self, session_id: SessionId, query: &str, ) -> Result<ConversationResponse>;
// Get conversation history pub async fn get_history( &self, session_id: SessionId, ) -> Result<Vec<ConversationTurn>>;
// Clear conversation context pub async fn reset_session( &self, session_id: SessionId, ) -> Result<()>;}
pub struct ConversationResponse { pub turn_id: u32, pub sql: Option<String>, // Generated SQL (if query was understood) pub explanation: String, // Natural language explanation pub results: Option<QueryResults>, // Execution results pub clarifications: Vec<String>, // Clarifying questions (if ambiguous) pub suggestions: Vec<String>, // Optimization suggestions pub confidence: f64, // 0.0-1.0 confidence score pub latency_ms: f64,}3.1.2 Session Manager
Responsibility: Manage conversation sessions and lifecycle
pub struct SessionManager { sessions: Arc<DashMap<SessionId, Session>>, persistence: Arc<SessionPersistence>, config: SessionConfig,}
pub struct Session { pub id: SessionId, pub user_id: String, pub database: String, pub created_at: DateTime<Utc>, pub last_accessed: DateTime<Utc>, pub turn_count: u32, pub context: ConversationContext,}
pub struct SessionConfig { pub max_sessions_per_user: usize, // Default: 10 pub idle_timeout_minutes: u64, // Default: 30 pub persist_sessions: bool, // Default: true (for recovery) pub cleanup_interval_minutes: u64, // Default: 5}
impl SessionManager { // Session lifecycle pub async fn create(&self, user_id: &str, database: &str) -> Result<SessionId>; pub async fn get(&self, session_id: SessionId) -> Result<Session>; pub async fn touch(&self, session_id: SessionId) -> Result<()>; // Update last_accessed pub async fn delete(&self, session_id: SessionId) -> Result<()>;
// Cleanup pub async fn cleanup_expired(&self) -> Result<usize>; // Returns count of cleaned sessions pub async fn get_user_sessions(&self, user_id: &str) -> Result<Vec<SessionId>>;}3.1.3 Context Tracker
Responsibility: Track and maintain conversation context across turns
pub struct ContextTracker { // Uses semantic similarity to track entities and intents embedding_model: Arc<EmbeddingModel>, entity_extractor: Arc<EntityExtractor>, reference_resolver: Arc<ReferenceResolver>,}
pub struct ConversationContext { pub turns: VecDeque<ConversationTurn>, // Bounded by max_turns pub entities: HashMap<String, Entity>, // Tracked entities (tables, columns, values) pub intents: VecDeque<Intent>, // Intent history pub current_focus: Option<Focus>, // What is the user currently focused on?}
pub struct ConversationTurn { pub turn_id: u32, pub timestamp: DateTime<Utc>, pub user_query: String, pub sql: Option<String>, pub results_summary: Option<String>, // Don't store full results (memory) pub entities_mentioned: Vec<String>, pub intent: Intent,}
pub enum Intent { Query, // Retrieve data Explore, // Browse/understand data Analyze, // Compute aggregations/trends Compare, // Compare entities Filter, // Refine previous results Clarify, // Ask follow-up question Export, // Export results}
pub struct Focus { pub entity_type: EntityType, // Table, Column, Value, Time Range pub entity_name: String, pub context_from_turn: u32, // Which turn introduced this focus?}
impl ContextTracker { // Add a new turn pub async fn add_turn(&self, context: &mut ConversationContext, turn: ConversationTurn);
// Resolve references ("that", "it", "the previous query") pub async fn resolve_references(&self, query: &str, context: &ConversationContext) -> Result<String>;
// Extract context-relevant information for prompt pub async fn build_context_prompt(&self, context: &ConversationContext) -> String;
// Detect if context needs clarification pub fn needs_clarification(&self, query: &str, context: &ConversationContext) -> Option<Vec<String>>;}3.1.4 NL2SQL Engine
Responsibility: Convert natural language to SQL with high accuracy
Architecture:
┌─────────────────────────────────────────────────────────────┐│ NL2SQL Pipeline │├─────────────────────────────────────────────────────────────┤│ ││ 1. Schema Augmentation ││ - Add column descriptions, examples, relationships ││ - Add domain-specific vocabulary ││ - Add value distributions and statistics ││ ││ 2. Query Understanding ││ - Intent detection (filter, aggregate, join, etc.) ││ - Entity extraction (tables, columns, values) ││ - Ambiguity detection ││ ││ 3. SQL Generation ││ - Few-shot prompting with similar examples ││ - Chain-of-thought reasoning ││ - Multi-dialect support (PostgreSQL, MySQL, Oracle) ││ ││ 4. Validation & Correction ││ - Syntax validation ││ - Semantic validation (column existence, types) ││ - Self-correction via LLM feedback loop ││ ││ 5. Optimization ││ - Query rewriting for performance ││ - Index recommendations ││ │└─────────────────────────────────────────────────────────────┘Implementation:
pub struct Nl2SqlEngine { schema_augmenter: Arc<SchemaAugmenter>, query_understander: Arc<QueryUnderstander>, sql_generator: Arc<SqlGenerator>, validator: Arc<SqlValidator>, optimizer: Arc<QueryOptimizer>, example_store: Arc<ExampleStore>, // For few-shot learning}
impl Nl2SqlEngine { pub async fn generate_sql( &self, query: &str, context: &ConversationContext, schema: &DatabaseSchema, dialect: SqlDialect, ) -> Result<SqlGenerationResult>;}
pub struct SqlGenerationResult { pub sql: String, pub confidence: f64, pub reasoning: Vec<String>, // Chain-of-thought steps pub validation_passed: bool, pub corrections_applied: Vec<String>, // Auto-corrections made pub warnings: Vec<String>,}Schema Augmentation:
pub struct SchemaAugmenter { // Enhance schema with semantic information embedding_model: Arc<EmbeddingModel>, value_profiler: Arc<ValueProfiler>,}
pub struct AugmentedSchema { pub base_schema: DatabaseSchema, pub table_descriptions: HashMap<String, String>, pub column_descriptions: HashMap<String, String>, pub column_examples: HashMap<String, Vec<String>>, // Sample values pub relationships: Vec<Relationship>, pub business_terms: HashMap<String, Vec<String>>, // "revenue" -> ["sales_amount", "total_price"] pub value_distributions: HashMap<String, ValueDistribution>,}
impl SchemaAugmenter { // Build augmented schema from database pub async fn augment(&self, schema: &DatabaseSchema) -> Result<AugmentedSchema>;
// Add custom business vocabulary pub async fn add_vocabulary(&self, term: &str, mappings: Vec<String>);
// Profile column values pub async fn profile_columns(&self, table: &str, columns: Vec<String>) -> Result<()>;}Few-Shot Example Store:
pub struct ExampleStore { // Store high-quality NL→SQL examples for few-shot prompting examples: Arc<DashMap<String, Vec<Example>>>, embedding_model: Arc<EmbeddingModel>, vector_index: Arc<HnswIndex>, // For similarity search}
pub struct Example { pub id: String, pub natural_language: String, pub sql: String, pub schema_context: String, pub dialect: SqlDialect, pub complexity: Complexity, pub embedding: Vec<f32>,}
pub enum Complexity { Simple, // Single table, basic WHERE Medium, // Joins, GROUP BY Complex, // Subqueries, CTEs, window functions}
impl ExampleStore { // Find k most similar examples for few-shot prompting pub async fn find_similar(&self, query: &str, k: usize) -> Result<Vec<Example>>;
// Add new example (can learn from successful queries) pub async fn add_example(&self, example: Example) -> Result<()>;
// Load standard examples (BIRD dataset, Spider, WikiSQL) pub async fn load_benchmark_examples(&self, dataset: &str) -> Result<usize>;}SQL Validator:
pub struct SqlValidator { parser: Arc<SqlParser>, schema_validator: Arc<SchemaValidator>,}
pub struct ValidationResult { pub valid: bool, pub errors: Vec<ValidationError>, pub warnings: Vec<String>, pub suggestions: Vec<String>,}
pub enum ValidationError { SyntaxError { message: String, position: usize }, TableNotFound { table: String }, ColumnNotFound { column: String, table: String }, TypeMismatch { expected: String, found: String }, AmbiguousColumn { column: String, tables: Vec<String> },}
impl SqlValidator { // Validate SQL against schema pub async fn validate(&self, sql: &str, schema: &AugmentedSchema) -> Result<ValidationResult>;
// Suggest corrections pub async fn suggest_corrections(&self, sql: &str, errors: &[ValidationError]) -> Result<Vec<String>>;}3.1.5 Query Explanation Engine
Responsibility: Generate natural language explanations of SQL queries and execution plans
pub struct ExplanationEngine { plan_explainer: Arc<PlanExplainer>, performance_predictor: Arc<PerformancePredictor>, optimizer_suggester: Arc<OptimizerSuggester>, nlg_model: Arc<NaturalLanguageGenerator>,}
impl ExplanationEngine { pub async fn explain_query( &self, sql: &str, execution_plan: &ExecutionPlan, results: &QueryResults, ) -> Result<Explanation>;}
pub struct Explanation { pub summary: String, // One-sentence summary pub detailed_steps: Vec<String>, // Step-by-step explanation pub performance_analysis: PerformanceAnalysis, pub optimization_suggestions: Vec<OptimizationSuggestion>, pub visual_plan: Option<String>, // ASCII art or mermaid diagram}
pub struct PerformanceAnalysis { pub estimated_cost: f64, pub estimated_rows: usize, pub index_usage: Vec<String>, pub bottlenecks: Vec<String>, pub parallel_potential: Option<String>,}
pub struct OptimizationSuggestion { pub suggestion: String, pub impact: ImpactLevel, pub effort: EffortLevel, pub sql_rewrite: Option<String>, // Alternative SQL}
pub enum ImpactLevel { High, // >50% improvement Medium, // 10-50% improvement Low, // <10% improvement}
pub enum EffortLevel { Easy, // Auto-applicable Medium, // Requires index creation Hard, // Requires schema changes}Plan Explainer:
pub struct PlanExplainer { // Convert query plans to natural language}
impl PlanExplainer { // Generate natural language explanation from execution plan pub async fn explain_plan(&self, plan: &ExecutionPlan) -> Result<Vec<String>>;
// Generate visual representation (ASCII tree, mermaid) pub fn visualize_plan(&self, plan: &ExecutionPlan, format: VisualizationFormat) -> String;}
pub enum VisualizationFormat { AsciiTree, Mermaid, Graphviz,}3.1.6 Model Router
Responsibility: Route requests to appropriate LLM models (local or cloud)
pub struct ModelRouter { models: HashMap<String, Arc<dyn LanguageModel>>, routing_strategy: RoutingStrategy, fallback_chain: Vec<String>,}
pub enum RoutingStrategy { Primary, // Always use primary model CostOptimized, // Use cheapest model that meets requirements LatencyOptimized, // Use fastest model AccuracyOptimized, // Use most accurate model Adaptive, // Choose based on query complexity}
pub trait LanguageModel: Send + Sync { async fn generate(&self, prompt: &str, config: &GenerationConfig) -> Result<String>; fn model_name(&self) -> &str; fn max_tokens(&self) -> usize; fn cost_per_1k_tokens(&self) -> f64;}
// Implementationspub struct OpenAIModel { /* GPT-4, GPT-3.5-turbo */ }pub struct AnthropicModel { /* Claude Sonnet, Opus */ }pub struct CohereModel { /* Command, Command-Light */ }pub struct OllamaModel { /* Local models: Llama, Mistral, CodeLlama */ }pub struct ONNXModel { /* Local ONNX runtime models */ }
impl ModelRouter { pub async fn route(&self, prompt: &str, requirements: &ModelRequirements) -> Result<Arc<dyn LanguageModel>>;}
pub struct ModelRequirements { pub max_latency_ms: Option<u64>, pub min_accuracy: Option<f64>, pub max_cost_per_query: Option<f64>, pub require_local: bool, // Must use local model (privacy)}3.2 Supporting Components
3.2.1 Schema Cache
pub struct SchemaCache { cache: Arc<DashMap<String, Arc<AugmentedSchema>>>, ttl: Duration,}
impl SchemaCache { pub async fn get_or_load(&self, database: &str) -> Result<Arc<AugmentedSchema>>; pub async fn invalidate(&self, database: &str); pub async fn refresh_all(&self) -> Result<()>;}3.2.2 Query Cache
pub struct QueryCache { // Cache NL→SQL mappings cache: Arc<DashMap<String, CachedQuery>>, embedding_model: Arc<EmbeddingModel>, similarity_threshold: f64, // Default: 0.85}
pub struct CachedQuery { pub nl_query: String, pub sql: String, pub schema_fingerprint: String, // Invalidate if schema changes pub hit_count: AtomicU64, pub last_used: DateTime<Utc>, pub average_latency_ms: f64,}
impl QueryCache { // Try to find semantically similar cached query pub async fn find_similar(&self, query: &str, threshold: f64) -> Option<CachedQuery>;
// Add successful query to cache pub async fn insert(&self, nl_query: &str, sql: &str, schema_fingerprint: &str);}4. Data Flow Architecture
4.1 Query Processing Flow
User Query: "Show me top 10 customers by revenue in 2024" │ ▼┌─────────────────────────────────────────────────────────────┐│ 1. Session Management ││ - Load/create session ││ - Load conversation context ││ - Check rate limits │└────────────────┬────────────────────────────────────────────┘ │ ▼┌─────────────────────────────────────────────────────────────┐│ 2. Context Resolution ││ - Resolve references ("that", "it", "them") ││ - Merge with previous context ││ - Detect ambiguities │└────────────────┬────────────────────────────────────────────┘ │ ▼┌─────────────────────────────────────────────────────────────┐│ 3. Query Cache Lookup (Semantic) ││ - Embed query ││ - Search for similar queries (>85% similarity) ││ - If found: Return cached SQL + explain │└────────────────┬────────────────────────────────────────────┘ │ (cache miss) ▼┌─────────────────────────────────────────────────────────────┐│ 4. Schema Loading & Augmentation ││ - Load schema from cache or database ││ - Augment with descriptions, examples, relationships ││ - Filter relevant tables/columns (reduce token count) │└────────────────┬────────────────────────────────────────────┘ │ ▼┌─────────────────────────────────────────────────────────────┐│ 5. Few-Shot Example Retrieval ││ - Find k=5 similar examples from example store ││ - Match on complexity, tables, operations │└────────────────┬────────────────────────────────────────────┘ │ ▼┌─────────────────────────────────────────────────────────────┐│ 6. Prompt Construction ││ - System prompt (role, constraints) ││ - Schema context (relevant tables/columns) ││ - Few-shot examples (5 similar NL→SQL pairs) ││ - Conversation history (last 3-5 turns) ││ - User query │└────────────────┬────────────────────────────────────────────┘ │ ▼┌─────────────────────────────────────────────────────────────┐│ 7. LLM Inference ││ - Route to appropriate model (local/cloud) ││ - Generate SQL with chain-of-thought reasoning ││ - Parse response │└────────────────┬────────────────────────────────────────────┘ │ ▼┌─────────────────────────────────────────────────────────────┐│ 8. SQL Validation ││ - Parse SQL (syntax check) ││ - Validate against schema (semantic check) ││ - If invalid: Self-correct via LLM feedback loop ││ - Max 3 correction attempts │└────────────────┬────────────────────────────────────────────┘ │ ▼┌─────────────────────────────────────────────────────────────┐│ 9. Query Optimization ││ - Analyze query plan ││ - Apply rule-based optimizations ││ - Generate optimization suggestions │└────────────────┬────────────────────────────────────────────┘ │ ▼┌─────────────────────────────────────────────────────────────┐│ 10. Query Execution (HeliosDB Compute) ││ - Execute via heliosdb-compute ││ - Collect execution stats ││ - Format results │└────────────────┬────────────────────────────────────────────┘ │ ▼┌─────────────────────────────────────────────────────────────┐│ 11. Explanation Generation ││ - Explain query in natural language ││ - Explain execution plan ││ - Provide performance analysis ││ - Suggest optimizations │└────────────────┬────────────────────────────────────────────┘ │ ▼┌─────────────────────────────────────────────────────────────┐│ 12. Response Assembly ││ - SQL + Results + Explanation + Suggestions ││ - Add to conversation context ││ - Cache successful query ││ - Return to user │└─────────────────────────────────────────────────────────────┘
Output: SQL: SELECT c.customer_name, SUM(o.amount) as revenue FROM customers c JOIN orders o ON c.id = o.customer_id WHERE YEAR(o.order_date) = 2024 GROUP BY c.id, c.customer_name ORDER BY revenue DESC LIMIT 10
Explanation: "This query finds the top 10 customers by total revenue in 2024. It joins the customers and orders tables, filters for 2024 orders, groups by customer, sums the order amounts, and returns the top 10."
Performance: "Estimated cost: 1250.3, rows: 10 Uses index: idx_orders_customer_date Bottleneck: Full table scan on customers (recommend index on id)"
Suggestions: - "Add index on customers(id) for 50% faster joins (High impact, Easy)" - "Consider materialized view for monthly revenue (Medium impact, Medium effort)"4.2 Multi-Turn Conversation Flow
Turn 1: "Show me sales by region" → SQL: SELECT region, SUM(amount) FROM sales GROUP BY region → Context: Focus on "sales table", "region column"
Turn 2: "Which region had the highest growth?" → Reference resolution: "region" refers to previous query → Need time comparison: Clarify "growth compared to what period?" → Response: Clarifying question
Turn 3: "Compared to last year" → SQL: WITH current AS (...), previous AS (...) SELECT ... FROM current JOIN previous ... → Context: Time range comparison, "year-over-year growth"
Turn 4: "Break that down by product" → Reference: "that" = previous query (regional growth) → SQL: Add product dimension to previous query → Context: Multi-dimensional analysis (region × product × time)5. Integration Points
5.1 HeliosDB Compute Integration
// heliosdb-compute provides query executionuse heliosdb_compute::{QueryExecutor, ExecutionPlan, QueryResult};
impl ConversationalBiEngine { async fn execute_sql(&self, sql: &str, database: &str) -> Result<QueryResult> { let executor = self.compute_engine.create_executor(database).await?;
// Get execution plan for explanation let plan = executor.explain(sql).await?;
// Execute query let result = executor.execute(sql).await?;
Ok((plan, result)) }}5.2 Schema Introspection
// Need heliosdb-schema crate (create if doesn't exist)use heliosdb_schema::{SchemaIntrospector, DatabaseSchema};
pub struct SchemaIntrospector { metadata_service: Arc<MetadataService>,}
impl SchemaIntrospector { pub async fn get_schema(&self, database: &str) -> Result<DatabaseSchema> { // Query information_schema or system catalogs let tables = self.get_tables(database).await?; let columns = self.get_columns(database).await?; let relationships = self.infer_relationships(database).await?; let indexes = self.get_indexes(database).await?;
Ok(DatabaseSchema { tables, columns, relationships, indexes, }) }}
pub struct DatabaseSchema { pub tables: Vec<Table>, pub columns: Vec<Column>, pub relationships: Vec<Relationship>, pub indexes: Vec<Index>,}5.3 ML Model Serving Integration
// heliosdb-ml provides model servinguse heliosdb_ml::{MlEngine, ModelConfig};
impl Nl2SqlEngine { async fn load_local_model(&self, model_path: &str) -> Result<Arc<ONNXModel>> { let ml_engine = MlEngine::new().await?;
let config = ModelConfig { name: "nl2sql_local".to_string(), path: model_path.to_string(), format: ModelFormat::Onnx, input_columns: vec!["input_text".to_string()], output_column: "sql_output".to_string(), version: "1.0".to_string(), metadata: Default::default(), };
ml_engine.register_model(config).await?;
Ok(Arc::new(ONNXModel::new(ml_engine))) }}5.4 Monitoring Integration
// heliosdb-monitoring for observability (use heliosdb-common metrics)use heliosdb_common::metrics::{MetricsCollector, Histogram, Counter};
pub struct ConversationalMetrics { query_latency: Histogram, cache_hit_rate: Counter, validation_failures: Counter, correction_attempts: Counter, model_inference_latency: Histogram,}
impl ConversationalBiEngine { fn record_metrics(&self, response: &ConversationResponse) { self.metrics.query_latency.observe(response.latency_ms); self.metrics.cache_hit_rate.inc_if(response.from_cache); // ... more metrics }}6. Multi-Turn Context Management
6.1 Context Structure
pub struct ConversationContext { // Conversation history (bounded deque) pub turns: VecDeque<ConversationTurn>,
// Entity tracking pub mentioned_tables: HashSet<String>, pub mentioned_columns: HashMap<String, String>, // column -> table pub mentioned_values: HashMap<String, Vec<String>>, // column -> values
// Current focus (what is the user looking at?) pub current_tables: Vec<String>, pub current_filters: Vec<Filter>, pub current_aggregations: Vec<Aggregation>, pub current_time_range: Option<TimeRange>,
// Reference tracking (for "it", "that", "them") pub last_entity: Option<Entity>, pub last_result_columns: Vec<String>,
// Intent tracking pub conversation_goal: Option<ConversationGoal>,}
pub enum ConversationGoal { Exploration, // User is exploring data Analysis, // User is performing specific analysis Reporting, // User is building a report Debugging, // User is investigating an issue}6.2 Reference Resolution
Pronoun Resolution:
impl ContextTracker { pub async fn resolve_pronouns(&self, query: &str, context: &ConversationContext) -> String { // "Show me that by product" → "Show me [regional sales] by product" let resolved = query .replace("that", &context.last_entity.as_ref().unwrap().to_string()) .replace("it", &context.last_entity.as_ref().unwrap().to_string()) .replace("them", &context.last_entity.as_ref().unwrap().to_string());
resolved }}Implicit Context:
Turn 1: "Show sales in California" Context: region = "California"
Turn 2: "What about New York?" Implicit: Keep same query structure, change region filter Resolved: "Show sales in New York"6.3 Clarification Detection
pub struct ClarificationEngine { ambiguity_detector: Arc<AmbiguityDetector>,}
impl ClarificationEngine { pub fn detect_ambiguities(&self, query: &str, context: &ConversationContext) -> Vec<Clarification> { let mut clarifications = Vec::new();
// Time range ambiguity if contains_time_reference(query) && context.current_time_range.is_none() { clarifications.push(Clarification { question: "Which time period are you interested in?".to_string(), suggestions: vec!["This month", "Last quarter", "This year", "Custom range"], }); }
// Column ambiguity (multiple columns with same name) if let Some(ambiguous_columns) = self.find_ambiguous_columns(query, &context.mentioned_tables) { clarifications.push(Clarification { question: format!("Which '{}' did you mean?", ambiguous_columns[0]), suggestions: ambiguous_columns.iter().map(|c| format!("{}.{}", c.table, c.column)).collect(), }); }
clarifications }}7. NL2SQL Engine Design
7.1 Prompt Engineering Strategy
System Prompt Template:
You are an expert SQL query generator for HeliosDB.
Your task is to convert natural language questions into accurate SQL queries.
CRITICAL RULES:1. Only use tables and columns that exist in the provided schema2. Always qualify column names with table aliases when joining tables3. Use appropriate aggregate functions (SUM, AVG, COUNT, etc.)4. Format dates using the database's date functions5. Include appropriate WHERE clauses for filters6. Use JOINs instead of subqueries when possible for performance7. Add LIMIT clauses for large result sets unless explicitly asked for all data8. Use proper GROUP BY when using aggregate functions
QUERY STRUCTURE:- Start with a clear understanding of the question- Identify the required tables and columns- Determine necessary JOINs- Apply filters (WHERE clause)- Add aggregations if needed (GROUP BY, HAVING)- Sort results appropriately (ORDER BY)- Limit results if appropriate
IMPORTANT:- If the question is ambiguous, generate the most likely interpretation and note the assumption- If critical information is missing, indicate what clarification is needed- Always validate your SQL against the schema before returning it
SQL DIALECT: {dialect} (PostgreSQL/MySQL/Oracle)Few-Shot Examples:
Example 1:Question: "What are the top 5 selling products last month?"Schema: products(id, name), orders(id, product_id, quantity, order_date)SQL:SELECT p.name, SUM(o.quantity) as total_quantityFROM products pJOIN orders o ON p.id = o.product_idWHERE o.order_date >= DATE_TRUNC('month', CURRENT_DATE - INTERVAL '1 month') AND o.order_date < DATE_TRUNC('month', CURRENT_DATE)GROUP BY p.id, p.nameORDER BY total_quantity DESCLIMIT 5;
Example 2:Question: "Show me customers who spent more than $1000"Schema: customers(id, name, email), orders(id, customer_id, total_amount)SQL:SELECT c.name, c.email, SUM(o.total_amount) as total_spentFROM customers cJOIN orders o ON c.id = o.customer_idGROUP BY c.id, c.name, c.emailHAVING SUM(o.total_amount) > 1000ORDER BY total_spent DESC;
... (3 more examples based on query similarity)Context Integration:
CONVERSATION HISTORY:Turn 1: User asked about "sales by region" SQL: SELECT region, SUM(amount) FROM sales GROUP BY region
Turn 2: User asked "which had the highest?" Interpretation: Referring to regions from previous query
CURRENT CONTEXT:- Focused tables: sales- Focused columns: region, amount- Time range: Not specified (needs clarification OR assume current year)- Last entity: "region"
CURRENT QUESTION: "{user_query}"
TASK: Generate SQL query considering the conversation context.If referring to previous results, build upon the previous query.7.2 Self-Correction Loop
pub struct SelfCorrectionEngine { max_attempts: usize, // Default: 3}
impl SelfCorrectionEngine { pub async fn correct_sql( &self, sql: &str, validation_errors: &[ValidationError], schema: &AugmentedSchema, model: &Arc<dyn LanguageModel>, ) -> Result<String> { let mut current_sql = sql.to_string();
for attempt in 1..=self.max_attempts { let validation = self.validator.validate(¤t_sql, schema).await?;
if validation.valid { return Ok(current_sql); }
// Build correction prompt let correction_prompt = format!( "The following SQL query has errors:\n\n{}\n\nErrors:\n{}\n\n\ Schema:\n{}\n\n\ Please correct the SQL query to fix these errors.", current_sql, validation.errors.iter().map(|e| format!("- {}", e)).collect::<Vec<_>>().join("\n"), schema.to_prompt_string() );
current_sql = model.generate(&correction_prompt, &GenerationConfig::default()).await?;
tracing::info!("Self-correction attempt {}: {}", attempt, current_sql); }
Err(anyhow::anyhow!("Failed to correct SQL after {} attempts", self.max_attempts)) }}7.3 Complex Query Handling
Subqueries and CTEs:
Question: "Show products with above-average prices in each category"
Generated SQL:WITH category_avg AS ( SELECT category_id, AVG(price) as avg_price FROM products GROUP BY category_id)SELECT p.name, p.price, c.name as category, ca.avg_priceFROM products pJOIN categories c ON p.category_id = c.idJOIN category_avg ca ON p.category_id = ca.category_idWHERE p.price > ca.avg_priceORDER BY c.name, p.price DESC;Window Functions:
Question: "Rank employees by salary within each department"
Generated SQL:SELECT e.name, d.name as department, e.salary, RANK() OVER (PARTITION BY e.department_id ORDER BY e.salary DESC) as rankFROM employees eJOIN departments d ON e.department_id = d.idORDER BY d.name, rank;8. Query Explanation System
8.1 Explanation Templates
Simple Query Explanation:
SQL: SELECT * FROM customers WHERE country = 'USA'
Explanation:"This query retrieves all customer records from the United States.
Step-by-step: 1. Scan the customers table 2. Filter rows where country equals 'USA' 3. Return all columns for matching rows
Performance: - Estimated rows: 1,250 (out of 10,000 total customers) - Cost: 125.3 - Index used: idx_customers_country (excellent!)
Suggestions: - Consider adding specific columns instead of SELECT * to reduce data transfer Example: SELECT name, email, city FROM customers WHERE country = 'USA'"Complex Query Explanation:
SQL: WITH monthly_revenue AS ( SELECT DATE_TRUNC('month', order_date) as month, SUM(amount) as revenue FROM orders WHERE order_date >= '2024-01-01' GROUP BY 1 ) SELECT month, revenue, revenue - LAG(revenue) OVER (ORDER BY month) as growth FROM monthly_revenue ORDER BY month;
Explanation:"This query calculates monthly revenue for 2024 and shows month-over-month growth.
Step-by-step: 1. Create a temporary result set (CTE) called 'monthly_revenue': - Filter orders from 2024 onwards - Group by month - Sum the order amounts for each month
2. Calculate month-over-month growth: - Use LAG() window function to get previous month's revenue - Subtract previous month from current month
3. Sort results by month chronologically
Performance: - Estimated cost: 2,450.8 - Estimated rows: 12 (one per month) - Full table scan on orders (200,000 rows)
Bottleneck: - The date filter on orders table requires a full scan
Suggestions: 1. Add index on order_date for 85% faster execution (High impact, Easy) SQL: CREATE INDEX idx_orders_date ON orders(order_date);
2. Consider partitioning orders table by month for even better performance (High impact, Medium effort)"8.2 Visual Query Plan
ASCII Tree Format:
└─ Sort (ORDER BY month) └─ Window Aggregate (LAG revenue) └─ CTE Scan: monthly_revenue └─ Aggregate (GROUP BY month, SUM amount) └─ Filter (order_date >= '2024-01-01') └─ Seq Scan: orders Cost: 2450.8, Rows: 200000Mermaid Diagram (for web UI):
graph TD A[Seq Scan: orders] -->|200,000 rows| B[Filter: date >= 2024] B -->|50,000 rows| C[Group By: month] C -->|12 rows| D[Window: LAG] D -->|12 rows| E[Sort: month]9. Model Architecture
9.1 Supported Models
| Model | Type | Cost/1M tokens | Latency | Accuracy | Use Case |
|---|---|---|---|---|---|
| GPT-4 | Cloud | $30 | 3-5s | 95% | Production (high accuracy) |
| GPT-3.5-turbo | Cloud | $2 | 1-2s | 85% | Development/testing |
| Claude 3 Sonnet | Cloud | $15 | 2-3s | 93% | Production (balanced) |
| Claude 3 Haiku | Cloud | $1.25 | 0.5-1s | 82% | High-throughput |
| Cohere Command | Cloud | $15 | 2-3s | 88% | Alternative option |
| Llama 3 70B (Ollama) | Local | Free | 5-10s | 82% | Privacy-sensitive |
| CodeLlama 34B | Local | Free | 3-5s | 85% | SQL-focused |
| Custom ONNX | Local | Free | 1-2s | 75-85% | Fine-tuned domain |
9.2 Model Selection Algorithm
impl ModelRouter { pub async fn select_model(&self, requirements: &ModelRequirements, query_complexity: Complexity) -> Arc<dyn LanguageModel> { // Priority 1: Privacy requirements if requirements.require_local { return self.select_best_local_model(query_complexity); }
// Priority 2: Latency requirements if let Some(max_latency) = requirements.max_latency_ms { if max_latency < 1000 { return self.models.get("claude-haiku").unwrap().clone(); } }
// Priority 3: Cost optimization if let Some(max_cost) = requirements.max_cost_per_query { return self.select_cheapest_model(max_cost, query_complexity); }
// Priority 4: Accuracy optimization match query_complexity { Complexity::Simple => self.models.get("gpt-3.5-turbo").unwrap().clone(), Complexity::Medium => self.models.get("claude-sonnet").unwrap().clone(), Complexity::Complex => self.models.get("gpt-4").unwrap().clone(), } }}9.3 Fine-Tuning Strategy
Domain Adaptation:
pub struct FineTuningManager { training_data: Arc<DashMap<String, Vec<TrainingExample>>>,}
pub struct TrainingExample { pub schema: String, pub nl_query: String, pub sql: String, pub validated: bool, // Human-validated correct query}
impl FineTuningManager { // Collect successful queries for fine-tuning pub async fn collect_example(&self, schema: &str, nl: &str, sql: &str) { let example = TrainingExample { schema: schema.to_string(), nl_query: nl.to_string(), sql: sql.to_string(), validated: false, };
self.training_data.entry(schema.to_string()) .or_insert_with(Vec::new) .push(example); }
// Export for fine-tuning pub async fn export_training_data(&self, format: TrainingFormat) -> Result<String> { // Export to JSON, JSONL, or other formats for fine-tuning // Can be used with OpenAI fine-tuning API, or local training }}10. Performance Optimization
10.1 Caching Strategy
Multi-Level Cache:
L1: In-Memory LRU Cache (hot queries) - Size: 10,000 queries - TTL: 1 hour - Hit rate: 40-50%
L2: Redis Cache (warm queries) - Size: 100,000 queries - TTL: 24 hours - Hit rate: 30-40%
L3: Semantic Similarity Cache - Embedding-based retrieval - Threshold: 0.85 cosine similarity - Hit rate: 20-30%
Total cache hit rate: 90-95%10.2 Latency Optimization
Parallel Processing:
impl ConversationalBiEngine { pub async fn process_query_parallel(&self, session_id: SessionId, query: &str) -> Result<ConversationResponse> { // Execute in parallel: let (schema_future, examples_future, context_future) = tokio::join!( self.schema_cache.get_or_load(&session.database), self.example_store.find_similar(query, 5), self.context_tracker.build_context_prompt(&session.context), );
let schema = schema_future?; let examples = examples_future?; let context = context_future?;
// Build prompt (fast, <10ms) let prompt = self.build_prompt(query, &schema, &examples, &context);
// LLM inference (slowest, 1-5s) let sql = self.model_router.route(&prompt, &requirements).await?.generate(&prompt).await?;
// Validate & execute in parallel let (validation_future, plan_future) = tokio::join!( self.validator.validate(&sql, &schema), self.compute_engine.explain(&sql), );
// ... }}Token Optimization:
impl SchemaAugmenter { // Reduce schema to only relevant tables/columns pub fn filter_relevant_schema(&self, schema: &AugmentedSchema, query: &str) -> AugmentedSchema { let mentioned_tables = self.extract_likely_tables(query, schema);
let filtered_tables = schema.tables.iter() .filter(|t| mentioned_tables.contains(&t.name)) .cloned() .collect();
// Reduces token count by 70-90%, improves latency 2-3x AugmentedSchema { tables: filtered_tables, ..schema.clone() } }}10.3 Throughput Optimization
Connection Pooling:
pub struct SessionPool { pool: Pool<SessionConnection>, max_connections: usize, // Default: 1000}
// Reuse LLM connections, reduce overheadBatching:
// Batch multiple queries to same modelpub async fn process_batch(&self, queries: Vec<(SessionId, String)>) -> Result<Vec<ConversationResponse>> { // Group by model // Send as batch to reduce network overhead // 30-50% latency reduction for high-throughput scenarios}11. Testing Strategy
11.1 Accuracy Testing
Benchmark Datasets:
-
BIRD (BIg Bench for LaRge Database Grounded Text-to-SQL Evaluation)
- 12,751 unique questions
- 95 databases
- Complex queries (CTEs, window functions, nested subqueries)
- Target: 95%+ execution accuracy
-
Spider
- 10,181 questions
- 200 databases
- Medium complexity
- Target: 98%+ accuracy
-
WikiSQL
- 80,654 questions
- Simple queries
- Target: 99%+ accuracy
Accuracy Metrics:
pub struct AccuracyMetrics { pub exact_match: f64, // SQL exactly matches gold standard pub execution_match: f64, // Results match (more lenient) pub valid_sql: f64, // SQL is syntactically valid pub schema_valid: f64, // SQL is semantically valid for schema}
impl AccuracyTester { pub async fn evaluate_on_bird(&self) -> AccuracyMetrics { let bird_dataset = self.load_bird_dataset()?;
let mut metrics = AccuracyMetrics::default();
for example in bird_dataset { let generated_sql = self.engine.generate_sql(&example.question, &example.schema).await?;
if generated_sql == example.gold_sql { metrics.exact_match += 1.0; }
let generated_results = self.execute(generated_sql)?; let gold_results = self.execute(example.gold_sql)?;
if generated_results == gold_results { metrics.execution_match += 1.0; } }
metrics.normalize(bird_dataset.len()); metrics }}11.2 Multi-Turn Testing
pub struct MultiTurnTestCase { pub conversation: Vec<ConversationTurn>, pub expected_sql: Vec<String>, pub test_context_retention: bool, pub test_reference_resolution: bool,}
// Example test caselet test = MultiTurnTestCase { conversation: vec![ ConversationTurn { query: "Show me sales by region", expected_sql: "SELECT region, SUM(amount) FROM sales GROUP BY region", }, ConversationTurn { query: "Which had the highest?", expected_sql: "SELECT region, SUM(amount) as total FROM sales GROUP BY region ORDER BY total DESC LIMIT 1", }, ConversationTurn { query: "Show that by month", expected_sql: "SELECT DATE_TRUNC('month', sale_date) as month, region, SUM(amount) FROM sales WHERE region = 'West' GROUP BY month, region", }, ], test_context_retention: true, test_reference_resolution: true,};11.3 Performance Testing
pub struct PerformanceBenchmark { latency_p50: Duration, latency_p95: Duration, latency_p99: Duration, throughput_qps: f64, cache_hit_rate: f64,}
impl PerformanceTester { pub async fn benchmark(&self, concurrency: usize, duration: Duration) -> PerformanceBenchmark { let mut latencies = Vec::new(); let start = Instant::now(); let mut total_queries = 0;
// Run concurrent queries let tasks: Vec<_> = (0..concurrency) .map(|_| { tokio::spawn(async move { while start.elapsed() < duration { let query_start = Instant::now(); let _ = self.engine.process_query(session_id, random_query()).await; latencies.push(query_start.elapsed()); total_queries += 1; } }) }) .collect();
// Wait for all tasks for task in tasks { task.await.unwrap(); }
// Calculate metrics latencies.sort(); PerformanceBenchmark { latency_p50: latencies[latencies.len() / 2], latency_p95: latencies[latencies.len() * 95 / 100], latency_p99: latencies[latencies.len() * 99 / 100], throughput_qps: total_queries as f64 / duration.as_secs_f64(), cache_hit_rate: self.query_cache.hit_rate(), } }}11.4 SQL Dialect Testing
pub struct DialectTestSuite { dialects: Vec<SqlDialect>,}
impl DialectTestSuite { pub async fn test_all_dialects(&self) { for dialect in &self.dialects { let examples = self.load_dialect_specific_examples(dialect);
for example in examples { let sql = self.engine.generate_sql(&example.query, dialect).await?;
// Validate SQL is valid for this dialect assert!(self.validate_dialect_sql(&sql, dialect));
// Check dialect-specific features match dialect { SqlDialect::PostgreSQL => { // Test RETURNING, CTEs, window functions } SqlDialect::MySQL => { // Test LIMIT syntax, ON DUPLICATE KEY UPDATE } SqlDialect::Oracle => { // Test ROWNUM, CONNECT BY, etc. } } } } }}12. Implementation Roadmap
12.1 Phase 1: Foundation (3 weeks, $240K)
Week 1: Core Infrastructure
- Create
heliosdb-conversational-bicrate - Implement
SessionManagerwith basic CRUD - Implement
ConversationContextdata structures - Implement
SchemaCachewith TTL - Deliverable: Basic session management working
- Team: 1 Senior Engineer, 1 Mid-Level Engineer
Week 2: Schema Integration
- Create
heliosdb-schemacrate (schema introspection) - Implement
SchemaAugmenter(descriptions, examples) - Integrate with existing
heliosdb-metadata - Build schema → prompt converter
- Deliverable: Schema augmentation pipeline
- Team: 1 Senior Engineer, 1 Mid-Level Engineer
Week 3: Model Integration
- Implement
ModelRouterwith OpenAI, Anthropic, Cohere clients - Implement local model support (Ollama integration)
- Build prompt template system
- Implement basic NL→SQL generation (no context yet)
- Deliverable: Single-turn NL2SQL working with 70%+ accuracy
- Team: 1 Senior ML Engineer, 1 Mid-Level Engineer
12.2 Phase 2: NL2SQL Engine (4 weeks, $320K)
Week 4: Query Understanding
- Implement
QueryUnderstander(intent, entity extraction) - Build
EntityExtractorusing NER models - Implement ambiguity detection
- Deliverable: Intent and entity extraction working
- Team: 1 Senior ML Engineer, 1 Mid-Level Engineer
Week 5: SQL Generation
- Implement
SqlGeneratorwith chain-of-thought prompting - Build
ExampleStorewith vector similarity search - Load BIRD, Spider, WikiSQL examples
- Implement few-shot example retrieval
- Deliverable: Complex SQL generation (CTEs, window functions)
- Team: 1 Senior Engineer, 1 ML Engineer
Week 6: Validation & Correction
- Implement
SqlValidator(syntax + semantic) - Build
SelfCorrectionEnginewith feedback loop - Implement error→correction prompt generation
- Deliverable: Self-correcting SQL generation, 85%+ accuracy
- Team: 1 Senior Engineer, 1 Mid-Level Engineer
Week 7: Multi-Dialect Support
- Extend to PostgreSQL, MySQL, Oracle dialects
- Implement dialect-specific transformations
- Build dialect-specific validation rules
- Deliverable: Multi-dialect SQL generation
- Team: 1 Senior Engineer
12.3 Phase 3: Multi-Turn Context (3 weeks, $240K)
Week 8: Context Tracking
- Implement
ContextTrackerwith entity tracking - Build reference resolution (“it”, “that”, “them”)
- Implement conversation history summarization
- Deliverable: 5+ turn conversations working
- Team: 1 Senior Engineer, 1 Mid-Level Engineer
Week 9: Clarification Engine
- Implement
ClarificationEngine(ambiguity detection) - Build clarifying question generation
- Implement interactive clarification flow
- Deliverable: System asks clarifying questions when needed
- Team: 1 Senior Engineer
Week 10: Context Optimization
- Optimize context window usage (token reduction)
- Implement semantic compression
- Build context pruning strategies
- Deliverable: 10+ turn conversations within token budget
- Team: 1 ML Engineer
12.4 Phase 4: Query Explanation (2 weeks, $160K)
Week 11: Plan Explanation
- Implement
ExplanationEngine - Build
PlanExplainer(natural language from execution plan) - Implement visual plan generation (ASCII, Mermaid)
- Deliverable: Natural language query explanations
- Team: 1 Senior Engineer, 1 Mid-Level Engineer
Week 12: Optimization Suggestions
- Implement
OptimizerSuggester - Build index recommendation engine
- Implement query rewrite suggestions
- Deliverable: Actionable optimization suggestions
- Team: 1 Senior Engineer (Database Optimization background)
12.5 Phase 5: Production Hardening & Testing (2.5 weeks, $240K)
Week 13: Performance Optimization
- Implement query caching (semantic similarity)
- Optimize prompt construction (parallel loading)
- Implement batching for throughput
- Deliverable: <2s p50 latency, >80% cache hit rate
- Team: 1 Senior Engineer, 1 Performance Engineer
Week 14-15: Testing & Benchmarking
- Run BIRD benchmark suite (target 95%+)
- Run Spider benchmark suite (target 98%+)
- Multi-turn conversation testing (100 test cases)
- Performance benchmarking (latency, throughput)
- Security testing (SQL injection prevention, access control)
- Deliverable: All benchmarks passing, production-ready
- Team: 2 QA Engineers, 1 Security Engineer
12.6 Timeline Summary
| Phase | Duration | Cost | Deliverables |
|---|---|---|---|
| Phase 1: Foundation | 3 weeks | $240K | Session mgmt, schema integration, basic NL2SQL |
| Phase 2: NL2SQL Engine | 4 weeks | $320K | Advanced SQL generation, validation, 85%+ accuracy |
| Phase 3: Multi-Turn Context | 3 weeks | $240K | 10+ turn conversations, reference resolution |
| Phase 4: Query Explanation | 2 weeks | $160K | Natural language explanations, optimization tips |
| Phase 5: Production Hardening | 2.5 weeks | $240K | Performance tuning, benchmarking, 95%+ accuracy |
| TOTAL | 14.5 weeks (2.5 months) | $1.2M | Production-ready Conversational BI |
Buffer: 2 weeks, $200K (contingency) Total with buffer: 16.5 weeks, $1.4M
13. Success Metrics
13.1 Accuracy Metrics
| Metric | Target | Measurement Method |
|---|---|---|
| BIRD Execution Accuracy | 95%+ | Run against full BIRD dataset (12,751 questions) |
| Spider Accuracy | 98%+ | Run against Spider dataset (10,181 questions) |
| WikiSQL Accuracy | 99%+ | Run against WikiSQL dataset (80,654 questions) |
| Exact Match Rate | 60%+ | Generated SQL exactly matches gold standard |
| Multi-Turn Accuracy | 90%+ | Correct SQL in 10+ turn conversations |
13.2 Performance Metrics
| Metric | Target | Measurement Method |
|---|---|---|
| Latency (p50) | <2s | Measure query generation time |
| Latency (p99) | <5s | Measure query generation time |
| Throughput | 100+ QPS | Concurrent session testing |
| Cache Hit Rate | 80%+ | Query cache effectiveness |
| Availability | 99.9% | Uptime monitoring |
13.3 User Experience Metrics
| Metric | Target | Measurement Method |
|---|---|---|
| Clarification Rate | <20% | How often system needs to ask clarifying questions |
| User Satisfaction | 4.5/5 | User survey (1-5 scale) |
| Adoption Rate | 60%+ | % of users who try it and continue using |
| Time to Insight | 3x faster | Compared to manual SQL writing |
13.4 Business Metrics
| Metric | Target | Impact |
|---|---|---|
| ARR | $60M | Revenue from Conversational BI feature |
| Market Share | #1 in NL2SQL | Best accuracy in the market (95%+ vs SOTA 68-80%) |
| Patent Value | $18M-$28M | Multi-turn conversation patent |
| Customer Logos | 500+ | Enterprise customers using feature |
14. Risk Mitigation
14.1 Technical Risks
| Risk | Probability | Impact | Mitigation |
|---|---|---|---|
| Accuracy below 95% | Medium | High | Use ensemble models, extensive fine-tuning, human-in-the-loop for edge cases |
| Latency exceeds 2s | Medium | Medium | Aggressive caching, model optimization, consider smaller models for simple queries |
| Model hallucination | Medium | High | Strict validation, self-correction loop, confidence thresholds |
| Schema drift | Low | Medium | Schema versioning, cache invalidation on schema changes |
14.2 Operational Risks
| Risk | Probability | Impact | Mitigation |
|---|---|---|---|
| LLM API costs | Medium | Medium | Implement cost caps, local model fallback, query caching |
| LLM API downtime | Medium | High | Multi-provider strategy, local model fallback |
| Scalability issues | Low | High | Horizontal scaling, connection pooling, distributed caching |
| Security vulnerabilities | Low | High | SQL injection prevention, parameterized queries, access control |
14.3 Business Risks
| Risk | Probability | Impact | Mitigation |
|---|---|---|---|
| Low adoption | Low | High | User training, documentation, gradual rollout, collect feedback |
| Competitor launches similar | Medium | Medium | Patent filing, first-mover advantage, continuous improvement |
| Privacy concerns | Medium | Medium | Local model option, transparent data handling, compliance certifications |
15. Future Enhancements (Post-v7.0)
15.1 Advanced Features
-
Multi-Modal Queries
- Support screenshots of charts/tables as input
- “Recreate this chart” → generates SQL + visualization config
-
Proactive Insights
- System suggests analyses based on data patterns
- “I noticed sales dropped 15% in APAC last week. Want to investigate?”
-
Automated Report Generation
- “Create a weekly sales report” → generates scheduled queries
-
Collaborative Features
- Share conversation sessions across team
- Comment on queries, suggest improvements
-
Integration with BI Tools
- Export to Tableau, Power BI, Looker
- Generate dashboard from conversation
15.2 Research Directions
-
Reinforcement Learning from Human Feedback (RLHF)
- Learn from user corrections
- Continuously improve accuracy
-
Graph-Based Query Planning
- Use graph neural networks for join optimization
- Predict optimal execution plans
-
Federated NL2SQL
- Query across multiple databases
- Handle heterogeneous schemas
Appendix A: API Examples
A.1 REST API
POST /api/v1/conversational-bi/sessionsContent-Type: application/json
{ "user_id": "user123", "database": "sales_db"}
Response:{ "session_id": "sess_abc123", "created_at": "2025-11-09T10:00:00Z", "expires_at": "2025-11-09T10:30:00Z"}POST /api/v1/conversational-bi/queryContent-Type: application/json
{ "session_id": "sess_abc123", "query": "Show me top 10 customers by revenue in 2024"}
Response:{ "turn_id": 1, "sql": "SELECT c.customer_name, SUM(o.amount) as revenue FROM customers c JOIN orders o ON c.id = o.customer_id WHERE YEAR(o.order_date) = 2024 GROUP BY c.id, c.customer_name ORDER BY revenue DESC LIMIT 10", "explanation": "This query finds the top 10 customers by total revenue in 2024...", "results": { "columns": ["customer_name", "revenue"], "rows": [ ["Acme Corp", 1500000], ["TechStart Inc", 1200000], ... ] }, "confidence": 0.95, "suggestions": [ "Add index on customers(id) for 50% faster joins" ], "latency_ms": 1850}A.2 SQL Interface
-- Start conversationSELECT conversational_bi.start_session('sales_db');-- Returns: sess_abc123
-- Ask questionSELECT * FROM conversational_bi.ask( 'sess_abc123', 'Show me top 10 customers by revenue in 2024');-- Returns: Generated SQL, explanation, results
-- Get conversation historySELECT * FROM conversational_bi.history('sess_abc123');A.3 Python SDK
from heliosdb import ConversationalBI
# Initializebi = ConversationalBI(connection_string="postgresql://localhost/sales_db")
# Start sessionsession = bi.start_session(user_id="user123")
# Ask questionsresponse = session.ask("Show me top 10 customers by revenue in 2024")print(response.sql)print(response.explanation)print(response.results.to_dataframe())
# Follow-upresponse = session.ask("Which region had the highest?")print(response.sql) # Uses context from previous query
# Get optimization suggestionsfor suggestion in response.suggestions: print(f"- {suggestion.description} (Impact: {suggestion.impact})")Appendix B: Patent Analysis
B.1 Patentable Innovations
1. Multi-Turn Context-Aware NL2SQL System
-
Claim: A method for maintaining conversation state across multiple natural language queries to a database, comprising:
- Tracking entities mentioned across conversation turns
- Resolving pronominal references using entity history
- Building context-aware prompts that incorporate previous query results
- Detecting when clarification is needed based on context ambiguity
-
Confidence: 85%
-
Prior Art: Limited systems handle 10+ turn conversations with full context retention
-
Estimated Value: $18M-$28M
2. Self-Correcting SQL Generation via LLM Feedback
-
Claim: A system for automatically correcting invalid SQL queries using iterative LLM refinement, comprising:
- Validating generated SQL against database schema
- Constructing correction prompts from validation errors
- Iteratively refining SQL through multiple LLM calls
- Terminating after N attempts or successful validation
-
Confidence: 70%
-
Prior Art: Some self-correction systems exist, but not specifically for SQL
-
Estimated Value: $8M-$12M
3. Schema-Augmented Few-Shot Learning for NL2SQL
-
Claim: A method for improving NL2SQL accuracy using augmented database schemas, comprising:
- Enhancing schema with column descriptions, examples, and relationships
- Selecting few-shot examples based on semantic similarity
- Filtering schema to relevant tables/columns to reduce token count
- Building prompts that combine schema, examples, and conversation context
-
Confidence: 75%
-
Prior Art: Few-shot NL2SQL exists, but schema augmentation is novel
-
Estimated Value: $10M-$15M
B.2 Filing Recommendation
Priority: P0 (file immediately) Justification: Core differentiators for Conversational BI feature Filing Strategy: US provisional → PCT → US/EU/China non-provisional
Appendix C: Competitive Analysis
| Feature | HeliosDB | Tableau Ask Data | Microsoft Copilot | ThoughtSpot | Metabase |
|---|---|---|---|---|---|
| Multi-Turn Context | 10+ turns | ❌ Single turn | ⚠ 3-5 turns | 5+ turns | ❌ Single turn |
| Accuracy (BIRD) | 95%+ | ❌ Not tested | ⚠ ~75% | ⚠ ~80% | ❌ Not tested |
| Complex Queries | CTEs, window functions | ⚠ Limited | Full SQL | Full SQL | ⚠ Limited |
| Explanation | NL + Visual | ⚠ Basic | Detailed | Detailed | ❌ None |
| Local Model Support | Yes (Ollama) | ❌ Cloud only | ❌ Cloud only | ❌ Cloud only | ❌ Cloud only |
| Multi-Dialect | 3+ dialects | ⚠ Limited | SQL Server | ⚠ Limited | 10+ dialects |
| Pricing | Included | $70/user/month | $30/user/month | $95/user/month | Free (OSS) |
Competitive Advantages:
- Best-in-class accuracy (95%+ vs 75-80%)
- Deepest multi-turn context (10+ turns vs 3-5)
- Local model support (privacy-sensitive use cases)
- Integrated with database (no separate BI tool needed)
- Open source option via Metabase integration
Document Status: Complete - Ready for Implementation Next Steps:
- Review with engineering leadership
- File provisional patent applications (3 innovations)
- Update Series A materials with Conversational BI
- Begin Phase 1 implementation (Week 1: Core Infrastructure)
Total Investment: $1.0M-$1.4M Timeline: 2.5-3 months Expected ROI: $60M ARR, 43x-60x return