Skip to content

Production Deployment: Troubleshooting

Production Deployment: Troubleshooting

Part of: Production Deployment Guide


9.1 Common Issues

Issue: High Query Latency

Symptoms:

  • Slow query response times
  • Timeouts
  • High CPU usage

Diagnosis:

Terminal window
# Check slow queries
heliosdb-cli query slow-log --limit 10
# Check query plans
heliosdb-cli query explain --query-id <id>
# Check cache hit rate
heliosdb-cli metrics cache --component query-cache

Resolution:

  1. Add appropriate indexes
  2. Enable query cache
  3. Increase compute nodes
  4. Review and optimize query patterns

Issue: Replication Lag

Symptoms:

  • Stale reads
  • Data inconsistency
  • Replication lag alerts

Diagnosis:

Terminal window
# Check replication status
heliosdb-cli replication status
# Check network latency
heliosdb-cli network latency --nodes all
# Check WAL size
heliosdb-cli storage wal-status

Resolution:

  1. Increase network bandwidth
  2. Optimize WAL settings
  3. Add more storage nodes
  4. Check for slow nodes

Issue: Out of Memory

Symptoms:

  • OOM kills
  • Node crashes
  • Slow performance

Diagnosis:

Terminal window
# Check memory usage
heliosdb-cli metrics memory
# Check cache sizes
heliosdb-cli cache stats
# Review query memory usage
heliosdb-cli query memory-usage --top 10

Resolution:

  1. Reduce cache sizes
  2. Optimize query work_mem settings
  3. Add more RAM or nodes
  4. Enable memory limits per query

9.2 Debug Procedures

Enable Debug Logging:

Terminal window
# Set log level
heliosdb-cli config set logging.level debug
# Enable query logging
heliosdb-cli config set logging.query_log_enabled true
# Enable tracing
export RUST_LOG=trace
export RUST_BACKTRACE=full

Collect Diagnostics:

Terminal window
# Generate diagnostic report
heliosdb-cli diagnostics collect \
--output /tmp/heliosdb-diagnostics.tar.gz \
--include-logs \
--include-metrics \
--include-config \
--time-range 1h

9.3 Performance Tuning

Query Performance:

-- Analyze query performance
EXPLAIN ANALYZE SELECT * FROM large_table WHERE id = 1;
-- Update statistics
ANALYZE large_table;
-- Rebuild indexes
REINDEX TABLE large_table;

Storage Performance:

Terminal window
# Check I/O statistics
heliosdb-cli storage io-stats
# Optimize compaction
heliosdb-cli storage compact --level 0-1
# Check fragmentation
heliosdb-cli storage fragmentation

Network Performance:

Terminal window
# Test network bandwidth
heliosdb-cli network benchmark --nodes all
# Check connection pool
heliosdb-cli network connections --status
# Optimize TCP settings
sysctl -w net.ipv4.tcp_window_scaling=1
sysctl -w net.core.rmem_max=134217728
sysctl -w net.core.wmem_max=134217728