Production Deployment: Troubleshooting
Production Deployment: Troubleshooting
Part of: Production Deployment Guide
9.1 Common Issues
Issue: High Query Latency
Symptoms:
- Slow query response times
- Timeouts
- High CPU usage
Diagnosis:
# Check slow queriesheliosdb-cli query slow-log --limit 10
# Check query plansheliosdb-cli query explain --query-id <id>
# Check cache hit rateheliosdb-cli metrics cache --component query-cacheResolution:
- Add appropriate indexes
- Enable query cache
- Increase compute nodes
- Review and optimize query patterns
Issue: Replication Lag
Symptoms:
- Stale reads
- Data inconsistency
- Replication lag alerts
Diagnosis:
# Check replication statusheliosdb-cli replication status
# Check network latencyheliosdb-cli network latency --nodes all
# Check WAL sizeheliosdb-cli storage wal-statusResolution:
- Increase network bandwidth
- Optimize WAL settings
- Add more storage nodes
- Check for slow nodes
Issue: Out of Memory
Symptoms:
- OOM kills
- Node crashes
- Slow performance
Diagnosis:
# Check memory usageheliosdb-cli metrics memory
# Check cache sizesheliosdb-cli cache stats
# Review query memory usageheliosdb-cli query memory-usage --top 10Resolution:
- Reduce cache sizes
- Optimize query work_mem settings
- Add more RAM or nodes
- Enable memory limits per query
9.2 Debug Procedures
Enable Debug Logging:
# Set log levelheliosdb-cli config set logging.level debug
# Enable query loggingheliosdb-cli config set logging.query_log_enabled true
# Enable tracingexport RUST_LOG=traceexport RUST_BACKTRACE=fullCollect Diagnostics:
# Generate diagnostic reportheliosdb-cli diagnostics collect \ --output /tmp/heliosdb-diagnostics.tar.gz \ --include-logs \ --include-metrics \ --include-config \ --time-range 1h9.3 Performance Tuning
Query Performance:
-- Analyze query performanceEXPLAIN ANALYZE SELECT * FROM large_table WHERE id = 1;
-- Update statisticsANALYZE large_table;
-- Rebuild indexesREINDEX TABLE large_table;Storage Performance:
# Check I/O statisticsheliosdb-cli storage io-stats
# Optimize compactionheliosdb-cli storage compact --level 0-1
# Check fragmentationheliosdb-cli storage fragmentationNetwork Performance:
# Test network bandwidthheliosdb-cli network benchmark --nodes all
# Check connection poolheliosdb-cli network connections --status
# Optimize TCP settingssysctl -w net.ipv4.tcp_window_scaling=1sysctl -w net.core.rmem_max=134217728sysctl -w net.core.wmem_max=134217728Navigation
- Previous: Backup & Disaster Recovery
- Next: Advanced Deployment Scenarios
- Index: Production Deployment Guide