ClickHouse to HeliosDB Migration Guide
ClickHouse to HeliosDB Migration Guide
Version: 1.0 Last Updated: January 2026 Compatibility: HeliosDB 7.0+, ClickHouse 22.x/23.x/24.x
Table of Contents
- Introduction
- Compatibility Overview
- Pre-Migration Assessment
- Connection String Migration
- Data Type Mapping
- Table Engine Mapping
- Query Syntax Migration
- Materialized View Migration
- Data Migration Process
- Application Connectivity
- Performance Considerations
- Troubleshooting Common Issues
- Post-Migration Validation
- Appendix
1. Introduction
Why Migrate from ClickHouse to HeliosDB?
HeliosDB provides 92% ClickHouse compatibility while offering significant advantages for modern data platforms:
Key Benefits
| Benefit | Description |
|---|---|
| Multi-Protocol Access | Access analytics data via ClickHouse, PostgreSQL, MySQL, MongoDB, Redis, GraphQL, and REST |
| Unified Data Platform | Single system for OLTP, OLAP, and streaming workloads |
| AI/ML Integration | Native NL2SQL, vector search, and ML model inference |
| Simplified Operations | No ZooKeeper/Keeper dependency, automatic rebalancing |
| Enhanced ACID | Full ACID compliance beyond ClickHouse’s eventual consistency |
| Time Travel | Point-in-time queries and data recovery |
| Multi-Model Support | Relational, document, graph, time-series, and vector in one database |
Migration Complexity: Low to Moderate
ClickHouse migrations are straightforward due to:
- 100% Native Protocol compatibility (TCP port 9000)
- 100% HTTP Protocol compatibility (port 8123)
- Standard ClickHouse drivers work without modification
- Most SQL queries execute identically
- MergeTree family engines fully supported
Scope
This guide covers:
- Schema migration for all table types
- Data migration strategies for various dataset sizes
- Application connection string and driver updates
- Table engine mapping to HeliosDB equivalents
- Query syntax differences and translations
- Materialized view migration
- Performance optimization post-migration
Target Audience
- Data engineers migrating ClickHouse analytics platforms
- DevOps engineers modernizing data infrastructure
- Architects evaluating unified data platforms
- Application developers updating ClickHouse integrations
2. Compatibility Overview
Overall Compatibility: 92%
| Category | Coverage | Status |
|---|---|---|
| Native Protocol (TCP) | 100% | Complete |
| HTTP Protocol | 100% | Complete |
| SQL Language | 90% | Complete |
| Table Engines | 85% | Core engines |
| Data Types | 95% | All standard types |
| Aggregation Functions | 95% | Full analytics support |
Protocol Support
Native Protocol (TCP Port 9000)
| Feature | Status | Notes |
|---|---|---|
| Connection | Supported | Full protocol compatibility |
| Authentication | Supported | Password-based, LDAP |
| Compression | Supported | LZ4, ZSTD, LZ4HC |
| SSL/TLS | Supported | Encrypted connections |
| Query Execution | Supported | Full SQL support |
| Batch Insert | Supported | High throughput |
| Prepared Statements | Supported | Query caching |
| Query Cancellation | Supported | Cancel running queries |
HTTP Protocol (Port 8123)
| Feature | Status | Notes |
|---|---|---|
| GET Queries | Supported | Read operations |
| POST Queries | Supported | Write operations |
| Streaming | Supported | Large result sets |
| Compression | Supported | gzip, deflate, br |
| JSON Output | Supported | Multiple formats |
| Progress Tracking | Supported | Query progress |
Driver Compatibility Matrix
| Language | Driver | Version | Status |
|---|---|---|---|
| Python | clickhouse-driver | 0.2+ | Fully Compatible |
| Python | clickhouse-connect | 0.6+ | Fully Compatible |
| Go | clickhouse-go | 2.x | Fully Compatible |
| Node.js | @clickhouse/client | 0.2+ | Fully Compatible |
| Java | ClickHouse JDBC | 0.4+ | Fully Compatible |
| Java | ClickHouse Native | 0.4+ | Fully Compatible |
| Rust | clickhouse-rs | 0.12+ | Fully Compatible |
| C# | ClickHouse.Client | 5.x | Fully Compatible |
| PHP | smi2/phpClickHouse | 1.x | Fully Compatible |
clickhouse-client Compatibility
# Connect to HeliosDB using standard clickhouse-clientclickhouse-client --host localhost --port 9000
# All standard options workclickhouse-client \ --host heliosdb.host \ --port 9000 \ --user default \ --password '' \ --database analytics \ --query "SELECT count() FROM events"3. Pre-Migration Assessment
3.1 Pre-Migration Checklist
- Backup ClickHouse data (full backup)
- Document cluster topology (replicas, shards)
- Inventory all databases and tables
- Analyze table engines in use
- Catalog materialized views
- Document dictionaries and external tables
- Estimate total data volume
- Review ZooKeeper/Keeper dependencies
- Identify TTL requirements
- List application connection configurations
- Plan migration window and rollback strategy
- Test migration in staging environment
3.2 Database and Table Inventory
Export Schema Information
# Export all database schemasclickhouse-client --query "SHOW DATABASES" > databases.txt
# Export table definitions for each databasefor db in $(cat databases.txt | grep -v system); do clickhouse-client --query "SHOW CREATE TABLE $db.*" > schema_${db}.sql 2>/dev/nulldone
# Export complete schemaclickhouse-client --query " SELECT database, name, engine, partition_key, sorting_key, primary_key, total_rows, total_bytes FROM system.tables WHERE database NOT IN ('system', 'INFORMATION_SCHEMA')" --format TSV > table_inventory.tsvAnalyze Table Engines
-- List all table engines in useSELECT engine, count() AS table_count, sum(total_rows) AS total_rows, formatReadableSize(sum(total_bytes)) AS total_sizeFROM system.tablesWHERE database NOT IN ('system', 'INFORMATION_SCHEMA')GROUP BY engineORDER BY table_count DESC;
-- Detailed table informationSELECT database, name, engine, partition_key, sorting_key, primary_key, total_rows, formatReadableSize(total_bytes) as sizeFROM system.tablesWHERE database NOT IN ('system', 'INFORMATION_SCHEMA')ORDER BY total_bytes DESC;3.3 Feature Usage Inventory
Document usage of the following features:
| Feature | Used? | Tables Affected | HeliosDB Support |
|---|---|---|---|
| MergeTree | Full | ||
| ReplacingMergeTree | Full | ||
| SummingMergeTree | Full | ||
| AggregatingMergeTree | Full | ||
| CollapsingMergeTree | Full | ||
| VersionedCollapsingMergeTree | Full | ||
| Distributed Tables | Full | ||
| Materialized Views | Full | ||
| TTL | Full | ||
| Dictionaries | Full | ||
| S3/External Tables | Full | ||
| Kafka Integration | Full | ||
| Replicated* Engines | Different approach |
3.4 Data Volume Estimation
-- Estimate total data sizeSELECT sum(total_bytes) AS bytes, formatReadableSize(sum(total_bytes)) AS readable_size, sum(total_rows) AS total_rowsFROM system.tablesWHERE database NOT IN ('system', 'INFORMATION_SCHEMA');
-- Per-database breakdownSELECT database, count() AS tables, formatReadableSize(sum(total_bytes)) AS size, sum(total_rows) AS rowsFROM system.tablesWHERE database NOT IN ('system', 'INFORMATION_SCHEMA')GROUP BY databaseORDER BY sum(total_bytes) DESC;Storage Planning for HeliosDB
| ClickHouse Size | Recommended HeliosDB Storage | Notes |
|---|---|---|
| < 100 GB | 1.1x ClickHouse size | Similar compression |
| 100 GB - 1 TB | 1.0x - 1.1x | Comparable efficiency |
| > 1 TB | 0.95x - 1.0x | Intelligent compression |
3.5 Query Pattern Analysis
-- Analyze recent query patternsSELECT type, query_kind, count() AS query_count, avg(query_duration_ms) AS avg_duration_ms, sum(read_rows) AS total_rows_readFROM system.query_logWHERE event_time > now() - INTERVAL 7 DAY AND type = 'QueryFinish'GROUP BY type, query_kindORDER BY query_count DESC;
-- Identify heavy queriesSELECT query, count() AS executions, avg(query_duration_ms) AS avg_ms, formatReadableSize(avg(read_bytes)) AS avg_readFROM system.query_logWHERE event_time > now() - INTERVAL 7 DAY AND type = 'QueryFinish'GROUP BY queryORDER BY avg_ms DESCLIMIT 20;4. Connection String Migration
4.1 Connection String Format
ClickHouse and HeliosDB use identical connection formats:
Before (ClickHouse):
clickhouse://user:password@clickhouse.host:9000/databaseAfter (HeliosDB):
clickhouse://user:password@heliosdb.host:9000/database4.2 Native Protocol Connection
Basic Connection (Host Change Only)
ClickHouse:
from clickhouse_driver import Client
client = Client( host='clickhouse.host', port=9000, user='default', password='mypassword', database='analytics')HeliosDB (Only host changes):
from clickhouse_driver import Client
client = Client( host='heliosdb.host', # Only this changes port=9000, user='default', password='mypassword', database='analytics')4.3 HTTP Protocol Connection
REST API Endpoint
ClickHouse:
curl 'http://clickhouse.host:8123/?query=SELECT%201'HeliosDB:
curl 'http://heliosdb.host:8123/?query=SELECT%201'4.4 Connection Parameters Reference
| Parameter | Default | Description | HeliosDB Support |
|---|---|---|---|
host | localhost | Server hostname | Full |
port | 9000 | Native protocol port | Full |
http_port | 8123 | HTTP protocol port | Full |
user | default | Username | Full |
password | - | Password | Full |
database | default | Database name | Full |
compression | lz4 | Compression type | LZ4, ZSTD |
secure | false | Use TLS | Full |
verify | true | Verify TLS certs | Full |
connect_timeout | 10 | Connection timeout (s) | Full |
send_receive_timeout | 300 | Query timeout (s) | Full |
sync_request_timeout | 5 | Sync timeout (s) | Full |
4.5 SSL/TLS Configuration
from clickhouse_driver import Client
# Secure connection with TLSclient = Client( host='heliosdb.host', port=9440, # Secure native port user='default', password='mypassword', database='analytics', secure=True, verify=True, ca_certs='/path/to/ca.crt')4.6 Connection Pool Configuration
from clickhouse_driver import Client
# Production connection pool settingsclient = Client( host='heliosdb.host', port=9000, user='default', password='mypassword', database='analytics', compression=True, settings={ 'max_threads': 8, 'max_execution_time': 300, 'max_memory_usage': 10000000000, # 10GB 'connect_timeout': 10, 'send_receive_timeout': 300 })5. Data Type Mapping
5.1 Numeric Types
| ClickHouse Type | HeliosDB Type | Notes |
|---|---|---|
| UInt8 | UInt8 | Identical |
| UInt16 | UInt16 | Identical |
| UInt32 | UInt32 | Identical |
| UInt64 | UInt64 | Identical |
| UInt128 | UInt128 | Identical |
| UInt256 | UInt256 | Identical |
| Int8 | Int8 | Identical |
| Int16 | Int16 | Identical |
| Int32 | Int32 | Identical |
| Int64 | Int64 | Identical |
| Int128 | Int128 | Identical |
| Int256 | Int256 | Identical |
| Float32 | Float32 | IEEE 754 |
| Float64 | Float64 | IEEE 754 |
| Decimal(P, S) | Decimal(P, S) | Arbitrary precision |
| Decimal32(S) | Decimal32(S) | 9 digit precision |
| Decimal64(S) | Decimal64(S) | 18 digit precision |
| Decimal128(S) | Decimal128(S) | 38 digit precision |
5.2 String Types
| ClickHouse Type | HeliosDB Type | Notes |
|---|---|---|
| String | String | Variable length UTF-8 |
| FixedString(N) | FixedString(N) | Fixed byte length |
| UUID | UUID | 128-bit UUID |
| IPv4 | IPv4 | IPv4 address |
| IPv6 | IPv6 | IPv6 address |
| Enum8 | Enum8 | 8-bit enumeration |
| Enum16 | Enum16 | 16-bit enumeration |
5.3 Date and Time Types
| ClickHouse Type | HeliosDB Type | Notes |
|---|---|---|
| Date | Date | Days since epoch |
| Date32 | Date32 | Extended range |
| DateTime | DateTime | Second precision |
| DateTime64(N) | DateTime64(N) | Subsecond precision |
| DateTime64(N, tz) | DateTime64(N, tz) | With timezone |
5.4 Composite Types
| ClickHouse Type | HeliosDB Type | Notes |
|---|---|---|
| Array(T) | Array(T) | Dynamic arrays |
| Tuple(T1, T2, …) | Tuple(T1, T2, …) | Fixed-size tuples |
| Map(K, V) | Map(K, V) | Key-value maps |
| Nested(name Type, …) | Nested(name Type, …) | Nested columns |
| Nullable(T) | Nullable(T) | NULL support |
| LowCardinality(T) | LowCardinality(T) | Dictionary encoding |
5.5 Special Types
| ClickHouse Type | HeliosDB Type | Notes |
|---|---|---|
| Bool | Bool | Boolean |
| JSON | JSON | JSON objects |
| Object(‘json’) | Object(‘json’) | JSON objects |
| Point | Point | Geospatial point |
| Ring | Ring | Geospatial ring |
| Polygon | Polygon | Geospatial polygon |
| MultiPolygon | MultiPolygon | Geospatial multi-polygon |
5.6 Aggregate Function Types
| ClickHouse Type | HeliosDB Type | Notes |
|---|---|---|
| AggregateFunction(name, T) | AggregateFunction(name, T) | Aggregate state |
| SimpleAggregateFunction(name, T) | SimpleAggregateFunction(name, T) | Simple aggregate |
5.7 Type Conversion Examples
-- All type casts work identically
-- Numeric conversionsSELECT toUInt32(123.45); -- 123SELECT toFloat64('123.456'); -- 123.456SELECT toDecimal64(123.456789, 4); -- 123.4568
-- String conversionsSELECT toString(12345); -- '12345'SELECT toFixedString('hello', 10); -- 'hello\0\0\0\0\0'
-- Date/Time conversionsSELECT toDate('2025-01-15'); -- 2025-01-15SELECT toDateTime('2025-01-15 10:30:00'); -- 2025-01-15 10:30:00SELECT toDateTime64('2025-01-15 10:30:00.123', 3);
-- Array conversionsSELECT toTypeName([1, 2, 3]); -- Array(UInt8)SELECT array(1, 2, 3); -- [1, 2, 3]
-- Nullable handlingSELECT toNullable(123);SELECT assumeNotNull(nullable_column);6. Table Engine Mapping
6.1 MergeTree Family
MergeTree
-- ClickHouseCREATE TABLE events ( timestamp DateTime, event_type String, user_id UInt32, value Float64) ENGINE = MergeTree()PARTITION BY toYYYYMM(timestamp)ORDER BY (timestamp, user_id)SETTINGS index_granularity = 8192;
-- HeliosDB (identical syntax)CREATE TABLE events ( timestamp DateTime, event_type String, user_id UInt32, value Float64) ENGINE = MergeTree()PARTITION BY toYYYYMM(timestamp)ORDER BY (timestamp, user_id)SETTINGS index_granularity = 8192;ReplacingMergeTree
-- For upsert/deduplication scenariosCREATE TABLE user_states ( user_id UInt32, version UInt64, name String, status String, updated_at DateTime) ENGINE = ReplacingMergeTree(version)PARTITION BY toYYYYMM(updated_at)ORDER BY user_id;
-- Query with deduplicationSELECT * FROM user_states FINAL WHERE user_id = 12345;SummingMergeTree
-- For pre-aggregated metricsCREATE TABLE metrics_daily ( date Date, metric_name String, total_value Float64, event_count UInt64) ENGINE = SummingMergeTree((total_value, event_count))PARTITION BY toYYYYMM(date)ORDER BY (date, metric_name);
-- Rows with same ORDER BY key are summedINSERT INTO metrics_daily VALUES ('2025-01-15', 'clicks', 100, 10);INSERT INTO metrics_daily VALUES ('2025-01-15', 'clicks', 50, 5);-- After merge: ('2025-01-15', 'clicks', 150, 15)AggregatingMergeTree
-- For complex pre-aggregationsCREATE TABLE events_hourly ( hour DateTime, event_type String, count_state AggregateFunction(count, UInt64), sum_value_state AggregateFunction(sum, Float64), uniq_users_state AggregateFunction(uniq, UInt32)) ENGINE = AggregatingMergeTree()PARTITION BY toYYYYMM(hour)ORDER BY (hour, event_type);
-- Insert with aggregate functionsINSERT INTO events_hourlySELECT toStartOfHour(timestamp) AS hour, event_type, countState() AS count_state, sumState(value) AS sum_value_state, uniqState(user_id) AS uniq_users_stateFROM eventsGROUP BY hour, event_type;
-- Query with mergeSELECT hour, event_type, countMerge(count_state) AS total_count, sumMerge(sum_value_state) AS total_value, uniqMerge(uniq_users_state) AS unique_usersFROM events_hourlyGROUP BY hour, event_type;CollapsingMergeTree
-- For state change trackingCREATE TABLE user_sessions ( user_id UInt32, session_start DateTime, session_duration UInt32, page_views UInt32, sign Int8 -- 1 for insert, -1 for delete) ENGINE = CollapsingMergeTree(sign)PARTITION BY toYYYYMM(session_start)ORDER BY (user_id, session_start);
-- Insert initial stateINSERT INTO user_sessions VALUES (1, '2025-01-15 10:00:00', 300, 5, 1);
-- Update by inserting old row with -1 and new row with 1INSERT INTO user_sessions VALUES (1, '2025-01-15 10:00:00', 300, 5, -1), -- Cancel old (1, '2025-01-15 10:00:00', 600, 10, 1); -- Insert new
-- Query with collapsingSELECT user_id, session_start, sum(session_duration * sign) AS duration, sum(page_views * sign) AS pagesFROM user_sessionsGROUP BY user_id, session_startHAVING sum(sign) > 0;VersionedCollapsingMergeTree
-- For ordered state changes with versionsCREATE TABLE user_states_versioned ( user_id UInt32, status String, version UInt64, sign Int8) ENGINE = VersionedCollapsingMergeTree(sign, version)ORDER BY user_id;6.2 Integration Engines
Distributed Tables
-- ClickHouse distributed tableCREATE TABLE events_distributed AS eventsENGINE = Distributed(cluster_name, database, events, rand());
-- HeliosDB (same syntax, uses internal distribution)CREATE TABLE events_distributed AS eventsENGINE = Distributed(default, analytics, events, rand());Kafka Integration
-- Kafka source tableCREATE TABLE events_kafka ( timestamp DateTime, event_type String, user_id UInt32, value Float64) ENGINE = Kafka()SETTINGS kafka_broker_list = 'kafka:9092', kafka_topic_list = 'events', kafka_group_name = 'heliosdb_consumer', kafka_format = 'JSONEachRow';
-- Materialized view to persist dataCREATE MATERIALIZED VIEW events_mv TO events ASSELECT * FROM events_kafka;S3 Tables
-- External S3 tableCREATE TABLE s3_data ( timestamp DateTime, event_type String, value Float64) ENGINE = S3( 'https://bucket.s3.amazonaws.com/data/*.parquet', 'AWS_ACCESS_KEY', 'AWS_SECRET_KEY', 'Parquet');
-- Query external dataSELECT event_type, sum(value)FROM s3_dataGROUP BY event_type;6.3 Special Engines
Memory Engine
-- In-memory table (same syntax)CREATE TABLE temp_data ( id UInt32, value String) ENGINE = Memory;Log Engines
-- Simple log tableCREATE TABLE log_data ( timestamp DateTime, message String) ENGINE = Log;
-- Tiny log for small tablesCREATE TABLE config_data ( key String, value String) ENGINE = TinyLog;Dictionary Engine
-- Create dictionaryCREATE DICTIONARY geo_dict ( country_code String, country_name String, population UInt64)PRIMARY KEY country_codeSOURCE(CLICKHOUSE( HOST 'localhost' PORT 9000 USER 'default' TABLE 'countries' DB 'reference'))LIFETIME(MIN 3600 MAX 7200)LAYOUT(FLAT());
-- Use in queriesSELECT dictGet('geo_dict', 'country_name', country_code) AS countryFROM events;6.4 Engine Migration Matrix
| ClickHouse Engine | HeliosDB Equivalent | Migration Complexity |
|---|---|---|
| MergeTree | MergeTree | None (identical) |
| ReplacingMergeTree | ReplacingMergeTree | None (identical) |
| SummingMergeTree | SummingMergeTree | None (identical) |
| AggregatingMergeTree | AggregatingMergeTree | None (identical) |
| CollapsingMergeTree | CollapsingMergeTree | None (identical) |
| VersionedCollapsingMergeTree | VersionedCollapsingMergeTree | None (identical) |
| Distributed | Distributed | Cluster config changes |
| ReplicatedMergeTree | MergeTree + HeliosDB replication | Schema change |
| Kafka | Kafka | Connection config only |
| S3 | S3 | Credential config only |
| Memory | Memory | None (identical) |
| Log/TinyLog | Log/TinyLog | None (identical) |
| Dictionary | Dictionary | None (identical) |
6.5 Replicated Engine Migration
ClickHouse Replicated* engines use ZooKeeper/Keeper. HeliosDB uses its own replication:
ClickHouse:
CREATE TABLE events_replicated ( timestamp DateTime, event_type String, user_id UInt32, value Float64) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/events', '{replica}')PARTITION BY toYYYYMM(timestamp)ORDER BY (timestamp, user_id);HeliosDB:
-- Use standard MergeTree; replication is automaticCREATE TABLE events ( timestamp DateTime, event_type String, user_id UInt32, value Float64) ENGINE = MergeTree()PARTITION BY toYYYYMM(timestamp)ORDER BY (timestamp, user_id);
-- Configure replication in HeliosDB settings-- No ZooKeeper/Keeper required7. Query Syntax Migration
7.1 Identical Syntax (No Changes Required)
Most ClickHouse queries work identically in HeliosDB:
-- SELECT with aggregationsSELECT event_type, count() AS events, count(DISTINCT user_id) AS unique_users, sum(value) AS total_value, avg(value) AS avg_value, min(value) AS min_value, max(value) AS max_valueFROM eventsWHERE timestamp >= '2025-01-01'GROUP BY event_typeORDER BY events DESCLIMIT 100;
-- PREWHERE for optimizationSELECT *FROM eventsPREWHERE timestamp >= '2025-01-01'WHERE event_type = 'purchase';
-- SAMPLE for approximate queriesSELECT event_type, count() * 10 AS estimated_countFROM events SAMPLE 0.1GROUP BY event_type;
-- FINAL for deduplicationSELECT * FROM user_states FINALWHERE user_id = 12345;
-- WITH clauses (CTEs)WITH daily_totals AS ( SELECT toDate(timestamp) AS date, sum(value) AS total FROM events GROUP BY date)SELECT date, total, total - lagInFrame(total) OVER (ORDER BY date) AS changeFROM daily_totals;7.2 Window Functions
-- All window functions supportedSELECT timestamp, user_id, value, row_number() OVER (PARTITION BY user_id ORDER BY timestamp) AS row_num, rank() OVER (PARTITION BY user_id ORDER BY value DESC) AS value_rank, dense_rank() OVER (PARTITION BY user_id ORDER BY value DESC) AS dense_value_rank, sum(value) OVER (PARTITION BY user_id ORDER BY timestamp) AS running_total, avg(value) OVER ( PARTITION BY user_id ORDER BY timestamp ROWS BETWEEN 5 PRECEDING AND CURRENT ROW ) AS moving_avg, lead(value, 1) OVER (PARTITION BY user_id ORDER BY timestamp) AS next_value, lag(value, 1) OVER (PARTITION BY user_id ORDER BY timestamp) AS prev_value, first_value(value) OVER (PARTITION BY user_id ORDER BY timestamp) AS first_val, last_value(value) OVER ( PARTITION BY user_id ORDER BY timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS last_valFROM eventsORDER BY user_id, timestamp;7.3 JOIN Operations
-- All JOIN types supported-- INNER JOINSELECT e.*, u.nameFROM events eINNER JOIN users u ON e.user_id = u.user_id;
-- LEFT JOINSELECT e.*, u.nameFROM events eLEFT JOIN users u ON e.user_id = u.user_id;
-- RIGHT JOINSELECT e.*, u.nameFROM events eRIGHT JOIN users u ON e.user_id = u.user_id;
-- FULL OUTER JOINSELECT e.*, u.nameFROM events eFULL JOIN users u ON e.user_id = u.user_id;
-- CROSS JOINSELECT *FROM events eCROSS JOIN config c;
-- ASOF JOIN (time-series join)SELECT e.timestamp, e.user_id, e.value, p.priceFROM events eASOF LEFT JOIN prices pON e.product_id = p.product_idAND e.timestamp >= p.effective_date;
-- GLOBAL JOIN for distributed tablesSELECT *FROM events_distributed eGLOBAL JOIN users_distributed u ON e.user_id = u.user_id;7.4 Aggregation Functions
-- Standard aggregationsSELECT count(), countDistinct(user_id), sum(value), avg(value), min(value), max(value), any(event_type), anyHeavy(event_type), anyLast(event_type)FROM events;
-- Statistical functionsSELECT stddevPop(value) AS stddev, stddevSamp(value) AS stddev_sample, varPop(value) AS variance, varSamp(value) AS variance_sample, covarPop(value, other_value) AS covariance, corr(value, other_value) AS correlationFROM events;
-- Quantile functionsSELECT quantile(0.5)(value) AS median, quantile(0.95)(value) AS p95, quantile(0.99)(value) AS p99, quantileExact(0.5)(value) AS exact_median, quantiles(0.25, 0.5, 0.75)(value) AS quartiles, quantileTiming(0.95)(response_ms) AS timing_p95FROM events;
-- Unique count functionsSELECT uniq(user_id) AS approx_unique, uniqExact(user_id) AS exact_unique, uniqCombined(user_id) AS combined_unique, uniqHLL12(user_id) AS hll_uniqueFROM events;
-- TopK functionsSELECT topK(10)(event_type) AS top_events, topKWeighted(10)(event_type, value) AS weighted_topFROM events;
-- Conditional aggregationsSELECT countIf(event_type = 'purchase') AS purchases, sumIf(value, event_type = 'purchase') AS purchase_value, avgIf(value, event_type = 'view') AS avg_view_valueFROM events;7.5 Array Functions
-- Array operationsSELECT [1, 2, 3] AS arr, array(1, 2, 3) AS arr2, arrayJoin([1, 2, 3]) AS expanded, arrayMap(x -> x * 2, [1, 2, 3]) AS doubled, arrayFilter(x -> x > 1, [1, 2, 3]) AS filtered, arrayReduce('sum', [1, 2, 3]) AS reduced, arraySort([3, 1, 2]) AS sorted, arrayReverse([1, 2, 3]) AS reversed, arraySlice([1, 2, 3, 4, 5], 2, 3) AS sliced, arrayConcat([1, 2], [3, 4]) AS concatenated, has([1, 2, 3], 2) AS contains, indexOf([1, 2, 3], 2) AS position, length([1, 2, 3]) AS len, arrayUniq([1, 1, 2, 2, 3]) AS unique_count;
-- Array aggregationsSELECT groupArray(user_id) AS user_ids, groupArrayDistinct(user_id) AS unique_user_ids, groupUniqArray(user_id) AS uniq_arrayFROM eventsWHERE event_type = 'purchase';7.6 String Functions
-- String operationsSELECT concat('Hello', ' ', 'World') AS concatenated, substring('ClickHouse', 1, 5) AS sub, length('Hello') AS len, lower('HELLO') AS lowered, upper('hello') AS uppered, trim(' hello ') AS trimmed, ltrim(' hello') AS ltrimmed, rtrim('hello ') AS rtrimmed, splitByChar(',', 'a,b,c') AS split, splitByString('::', 'a::b::c') AS split_str, replaceAll('hello world', 'world', 'universe') AS replaced, reverse('hello') AS reversed, position('hello', 'l') AS pos, match('hello', 'e.+o') AS regex_match, extract('hello123world', '\\d+') AS extracted;
-- Format functionsSELECT format('{} {}', 'Hello', 'World') AS formatted, formatReadableSize(1024 * 1024 * 1024) AS readable_size, formatReadableQuantity(1000000) AS readable_qty;7.7 Date/Time Functions
-- Date/Time operationsSELECT now() AS current_time, today() AS current_date, yesterday() AS prev_date, toDate('2025-01-15') AS date, toDateTime('2025-01-15 10:30:00') AS datetime, toDateTime64('2025-01-15 10:30:00.123456', 6) AS datetime64, toYear(now()) AS year, toMonth(now()) AS month, toDayOfMonth(now()) AS day, toHour(now()) AS hour, toMinute(now()) AS minute, toSecond(now()) AS second, toDayOfWeek(now()) AS day_of_week, toDayOfYear(now()) AS day_of_year, toStartOfDay(now()) AS start_of_day, toStartOfHour(now()) AS start_of_hour, toStartOfMinute(now()) AS start_of_minute, toStartOfMonth(now()) AS start_of_month, toStartOfQuarter(now()) AS start_of_quarter, toStartOfYear(now()) AS start_of_year, toStartOfWeek(now()) AS start_of_week, date_add(day, 7, today()) AS week_later, date_sub(month, 1, today()) AS month_ago, dateDiff('day', '2025-01-01', '2025-01-15') AS days_diff;7.8 Conditional Functions
-- Conditional operationsSELECT if(value > 100, 'high', 'low') AS category, multiIf( value > 1000, 'very high', value > 100, 'high', value > 10, 'medium', 'low' ) AS multi_category, CASE WHEN value > 1000 THEN 'very high' WHEN value > 100 THEN 'high' WHEN value > 10 THEN 'medium' ELSE 'low' END AS case_category, coalesce(nullable_value, 0) AS with_default, ifNull(nullable_value, 0) AS if_null, nullIf(value, 0) AS null_if_zeroFROM events;8. Materialized View Migration
8.1 Standard Materialized Views
Materialized views work identically in HeliosDB:
-- Create target tableCREATE TABLE events_daily ( date Date, event_type String, total_events UInt64, total_value Float64, unique_users UInt64) ENGINE = SummingMergeTree((total_events, total_value, unique_users))PARTITION BY toYYYYMM(date)ORDER BY (date, event_type);
-- Create materialized viewCREATE MATERIALIZED VIEW events_daily_mv TO events_daily ASSELECT toDate(timestamp) AS date, event_type, count() AS total_events, sum(value) AS total_value, uniq(user_id) AS unique_usersFROM eventsGROUP BY date, event_type;8.2 Aggregating Materialized Views
-- Hourly aggregations with AggregatingMergeTreeCREATE TABLE metrics_hourly ( hour DateTime, metric_name String, count_state AggregateFunction(count, UInt64), sum_state AggregateFunction(sum, Float64), avg_state AggregateFunction(avg, Float64), min_state AggregateFunction(min, Float64), max_state AggregateFunction(max, Float64), uniq_state AggregateFunction(uniq, UInt32), quantile_state AggregateFunction(quantile(0.95), Float64)) ENGINE = AggregatingMergeTree()PARTITION BY toYYYYMM(hour)ORDER BY (hour, metric_name);
CREATE MATERIALIZED VIEW metrics_hourly_mv TO metrics_hourly ASSELECT toStartOfHour(timestamp) AS hour, metric_name, countState() AS count_state, sumState(value) AS sum_state, avgState(value) AS avg_state, minState(value) AS min_state, maxState(value) AS max_state, uniqState(user_id) AS uniq_state, quantileState(0.95)(value) AS quantile_stateFROM metricsGROUP BY hour, metric_name;
-- Query with mergeSELECT hour, metric_name, countMerge(count_state) AS total_count, sumMerge(sum_state) AS total_sum, avgMerge(avg_state) AS average, minMerge(min_state) AS minimum, maxMerge(max_state) AS maximum, uniqMerge(uniq_state) AS unique_count, quantileMerge(0.95)(quantile_state) AS p95FROM metrics_hourlyWHERE hour >= toStartOfDay(now())GROUP BY hour, metric_nameORDER BY hour;8.3 Cascading Materialized Views
-- First level: HourlyCREATE TABLE events_hourly ( hour DateTime, event_type String, count UInt64, value_sum Float64) ENGINE = SummingMergeTree((count, value_sum))ORDER BY (hour, event_type);
CREATE MATERIALIZED VIEW events_hourly_mv TO events_hourly ASSELECT toStartOfHour(timestamp) AS hour, event_type, count() AS count, sum(value) AS value_sumFROM eventsGROUP BY hour, event_type;
-- Second level: Daily (aggregates from hourly)CREATE TABLE events_daily_summary ( date Date, event_type String, count UInt64, value_sum Float64) ENGINE = SummingMergeTree((count, value_sum))ORDER BY (date, event_type);
CREATE MATERIALIZED VIEW events_daily_mv TO events_daily_summary ASSELECT toDate(hour) AS date, event_type, sum(count) AS count, sum(value_sum) AS value_sumFROM events_hourlyGROUP BY date, event_type;8.4 Materialized View with POPULATE
-- Create materialized view and populate with existing dataCREATE MATERIALIZED VIEW user_stats_mv TO user_statsPOPULATEAS SELECT user_id, count() AS total_events, sum(value) AS total_value, max(timestamp) AS last_activityFROM eventsGROUP BY user_id;9. Data Migration Process
9.1 Migration Strategy Selection
| Data Size | Recommended Method | Estimated Time |
|---|---|---|
| < 10 GB | INSERT SELECT via link | Minutes |
| 10-100 GB | COPY with files | 30 min - 2 hours |
| 100 GB - 1 TB | Parallel export/import | 2-8 hours |
| > 1 TB | Streaming or incremental | Hours to days |
9.2 Method 1: Direct INSERT SELECT (Small Datasets)
-- Connect HeliosDB to ClickHouse as external source-- Then copy data directly
-- In HeliosDB:CREATE TABLE events_clickhouse ( timestamp DateTime, event_type String, user_id UInt32, value Float64) ENGINE = MySQL('clickhouse.host:9004', 'analytics', 'events', 'user', 'pass');
-- Copy dataINSERT INTO events SELECT * FROM events_clickhouse;9.3 Method 2: Export/Import with Files
Export from ClickHouse
# Export to Native format (fastest)clickhouse-client \ --host clickhouse.host \ --query "SELECT * FROM analytics.events FORMAT Native" \ > events.native
# Export to CSVclickhouse-client \ --host clickhouse.host \ --query "SELECT * FROM analytics.events FORMAT CSVWithNames" \ > events.csv
# Export to Parquet (compressed)clickhouse-client \ --host clickhouse.host \ --query "SELECT * FROM analytics.events FORMAT Parquet" \ > events.parquet
# Export to JSONclickhouse-client \ --host clickhouse.host \ --query "SELECT * FROM analytics.events FORMAT JSONEachRow" \ > events.jsonImport to HeliosDB
# Import from Native formatclickhouse-client \ --host heliosdb.host \ --query "INSERT INTO analytics.events FORMAT Native" \ < events.native
# Import from CSVclickhouse-client \ --host heliosdb.host \ --query "INSERT INTO analytics.events FORMAT CSVWithNames" \ < events.csv
# Import from Parquetclickhouse-client \ --host heliosdb.host \ --query "INSERT INTO analytics.events FORMAT Parquet" \ < events.parquet9.4 Method 3: Python Migration Script
#!/usr/bin/env python3"""ClickHouse to HeliosDB Migration ScriptHandles large tables with batching and progress tracking."""
from clickhouse_driver import Clientimport timeimport sys
def migrate_table(source_host, target_host, database, table, batch_size=100000): """Migrate a single table with batching."""
source = Client( host=source_host, port=9000, database=database )
target = Client( host=target_host, port=9000, database=database )
# Get row count total_rows = source.execute(f'SELECT count() FROM {table}')[0][0] print(f"Migrating {table}: {total_rows:,} rows")
# Get column info columns_result = source.execute(f'DESCRIBE TABLE {table}') columns = ', '.join([col[0] for col in columns_result])
# Migrate in batches offset = 0 migrated = 0 start_time = time.time()
while offset < total_rows: # Read batch from source rows = source.execute( f'SELECT {columns} FROM {table} LIMIT {batch_size} OFFSET {offset}' )
if not rows: break
# Write batch to target target.execute( f'INSERT INTO {table} ({columns}) VALUES', rows )
offset += batch_size migrated += len(rows)
# Progress update elapsed = time.time() - start_time rate = migrated / elapsed if elapsed > 0 else 0 progress = (migrated / total_rows) * 100 eta = (total_rows - migrated) / rate if rate > 0 else 0
print(f" Progress: {progress:.1f}% ({migrated:,}/{total_rows:,}) " f"Rate: {rate:,.0f} rows/sec ETA: {eta:.0f}s")
elapsed = time.time() - start_time print(f"Completed {table}: {migrated:,} rows in {elapsed:.1f}s " f"({migrated/elapsed:,.0f} rows/sec)")
return migrated
def migrate_database(source_host, target_host, database, tables=None): """Migrate entire database or specific tables."""
source = Client(host=source_host, port=9000, database=database)
# Get tables if not specified if tables is None: result = source.execute( f"SELECT name FROM system.tables WHERE database = '{database}'" ) tables = [row[0] for row in result]
print(f"Migrating {len(tables)} tables from {database}")
total_migrated = 0 for table in tables: try: migrated = migrate_table( source_host, target_host, database, table ) total_migrated += migrated except Exception as e: print(f"Error migrating {table}: {e}") continue
print(f"\nTotal migrated: {total_migrated:,} rows") return total_migrated
if __name__ == "__main__": # Configuration SOURCE_HOST = 'clickhouse.host' TARGET_HOST = 'heliosdb.host' DATABASE = 'analytics'
# Migrate all tables migrate_database(SOURCE_HOST, TARGET_HOST, DATABASE)9.5 Method 4: Parallel Migration for Large Datasets
#!/usr/bin/env python3"""Parallel migration for large ClickHouse tables.Uses multiple workers for concurrent data transfer."""
from clickhouse_driver import Clientfrom concurrent.futures import ThreadPoolExecutor, as_completedimport threadingimport time
class ParallelMigrator: def __init__(self, source_host, target_host, database, num_workers=4): self.source_host = source_host self.target_host = target_host self.database = database self.num_workers = num_workers self.lock = threading.Lock() self.total_migrated = 0
def get_client(self, host): return Client( host=host, port=9000, database=self.database )
def migrate_partition(self, table, partition_id, columns): """Migrate a single partition.""" source = self.get_client(self.source_host) target = self.get_client(self.target_host)
# Export partition data rows = source.execute(f""" SELECT {columns} FROM {table} WHERE _partition_id = '{partition_id}' """)
if rows: target.execute( f'INSERT INTO {table} ({columns}) VALUES', rows )
with self.lock: self.total_migrated += len(rows)
return len(rows)
def migrate_table(self, table): """Migrate table using parallel partition processing.""" source = self.get_client(self.source_host)
# Get partitions partitions = source.execute(f""" SELECT DISTINCT _partition_id FROM {table} """) partition_ids = [p[0] for p in partitions]
# Get columns columns_result = source.execute(f'DESCRIBE TABLE {table}') columns = ', '.join([col[0] for col in columns_result])
print(f"Migrating {table}: {len(partition_ids)} partitions")
start_time = time.time()
with ThreadPoolExecutor(max_workers=self.num_workers) as executor: futures = { executor.submit( self.migrate_partition, table, pid, columns ): pid for pid in partition_ids }
completed = 0 for future in as_completed(futures): partition_id = futures[future] try: rows = future.result() completed += 1 print(f" Partition {partition_id}: {rows:,} rows " f"({completed}/{len(partition_ids)})") except Exception as e: print(f" Error in partition {partition_id}: {e}")
elapsed = time.time() - start_time print(f"Completed {table}: {self.total_migrated:,} rows " f"in {elapsed:.1f}s")
if __name__ == "__main__": migrator = ParallelMigrator( source_host='clickhouse.host', target_host='heliosdb.host', database='analytics', num_workers=8 )
migrator.migrate_table('events')9.6 Method 5: Incremental/CDC Migration
For zero-downtime migrations with continuous data sync:
#!/usr/bin/env python3"""Incremental migration with change tracking.Suitable for zero-downtime migrations."""
from clickhouse_driver import Clientimport time
def incremental_sync(source_host, target_host, database, table, timestamp_column='timestamp', interval_seconds=60): """Continuously sync new data from source to target."""
source = Client(host=source_host, port=9000, database=database) target = Client(host=target_host, port=9000, database=database)
# Get columns columns_result = source.execute(f'DESCRIBE TABLE {table}') columns = ', '.join([col[0] for col in columns_result])
# Get initial watermark from target result = target.execute(f'SELECT max({timestamp_column}) FROM {table}') watermark = result[0][0] or '1970-01-01 00:00:00'
print(f"Starting incremental sync from {watermark}")
while True: # Get new rows since watermark rows = source.execute(f""" SELECT {columns} FROM {table} WHERE {timestamp_column} > '{watermark}' ORDER BY {timestamp_column} LIMIT 100000 """)
if rows: # Insert to target target.execute( f'INSERT INTO {table} ({columns}) VALUES', rows )
# Update watermark new_watermark = max(row[0] for row in rows) # Assumes first col is timestamp print(f"Synced {len(rows)} rows up to {new_watermark}") watermark = new_watermark else: print(f"No new data (watermark: {watermark})")
# Wait before next sync time.sleep(interval_seconds)
if __name__ == "__main__": incremental_sync( source_host='clickhouse.host', target_host='heliosdb.host', database='analytics', table='events', timestamp_column='timestamp', interval_seconds=30 )9.7 Schema Migration Script
#!/bin/bash# migrate_schema.sh - Export and import ClickHouse schema to HeliosDB
SOURCE_HOST="clickhouse.host"TARGET_HOST="heliosdb.host"DATABASE="analytics"
echo "=== Schema Migration: ClickHouse -> HeliosDB ==="
# Export schemaecho "Exporting schema from $SOURCE_HOST..."clickhouse-client --host $SOURCE_HOST --query " SELECT create_table_query FROM system.tables WHERE database = '$DATABASE' AND engine NOT LIKE 'Replicated%'" > schema_export.sql
# Handle ReplicatedMergeTree -> MergeTree conversionecho "Converting Replicated* engines to standard engines..."sed -i 's/ReplicatedMergeTree([^)]*)/MergeTree()/g' schema_export.sqlsed -i 's/ReplicatedReplacingMergeTree([^)]*)/ReplacingMergeTree()/g' schema_export.sqlsed -i 's/ReplicatedSummingMergeTree([^)]*)/SummingMergeTree()/g' schema_export.sqlsed -i 's/ReplicatedAggregatingMergeTree([^)]*)/AggregatingMergeTree()/g' schema_export.sql
# Create database in HeliosDBecho "Creating database in HeliosDB..."clickhouse-client --host $TARGET_HOST --query "CREATE DATABASE IF NOT EXISTS $DATABASE"
# Import schemaecho "Importing schema to $TARGET_HOST..."while IFS= read -r query; do if [ -n "$query" ]; then clickhouse-client --host $TARGET_HOST --query "$query" || \ echo "Failed: $query" fidone < schema_export.sql
echo "Schema migration complete!"10. Application Connectivity
10.1 Python (clickhouse-driver)
from clickhouse_driver import Client
# Before (ClickHouse)client = Client( host='clickhouse.host', port=9000, user='default', password='password', database='analytics', compression=True)
# After (HeliosDB) - Only host changesclient = Client( host='heliosdb.host', # Change only this port=9000, user='default', password='password', database='analytics', compression=True)
# Usage remains identicalresult = client.execute(''' SELECT event_type, count() AS cnt FROM events WHERE timestamp > now() - INTERVAL 1 DAY GROUP BY event_type ORDER BY cnt DESC''')
for row in result: print(f"{row[0]}: {row[1]}")10.2 Python (clickhouse-connect)
import clickhouse_connect
# Before (ClickHouse)client = clickhouse_connect.get_client( host='clickhouse.host', port=8123, username='default', password='password', database='analytics')
# After (HeliosDB) - Only host changesclient = clickhouse_connect.get_client( host='heliosdb.host', # Change only this port=8123, username='default', password='password', database='analytics')
# Query with Pandas integrationdf = client.query_df(''' SELECT toDate(timestamp) AS date, event_type, count() AS events FROM events GROUP BY date, event_type''')
print(df.head())10.3 Go (clickhouse-go)
package main
import ( "context" "fmt" "github.com/ClickHouse/clickhouse-go/v2")
func main() { // Before (ClickHouse) // conn, _ := clickhouse.Open(&clickhouse.Options{ // Addr: []string{"clickhouse.host:9000"}, // ... // })
// After (HeliosDB) - Only address changes conn, err := clickhouse.Open(&clickhouse.Options{ Addr: []string{"heliosdb.host:9000"}, // Change only this Auth: clickhouse.Auth{ Database: "analytics", Username: "default", Password: "password", }, Compression: &clickhouse.Compression{ Method: clickhouse.CompressionLZ4, }, Settings: clickhouse.Settings{ "max_execution_time": 60, }, }) if err != nil { panic(err) } defer conn.Close()
ctx := context.Background()
rows, err := conn.Query(ctx, ` SELECT event_type, count() AS cnt FROM events WHERE timestamp > now() - INTERVAL 1 DAY GROUP BY event_type `) if err != nil { panic(err) } defer rows.Close()
for rows.Next() { var eventType string var count uint64 rows.Scan(&eventType, &count) fmt.Printf("%s: %d\n", eventType, count) }}10.4 Node.js (@clickhouse/client)
const { createClient } = require('@clickhouse/client');
// Before (ClickHouse)// const client = createClient({// host: 'http://clickhouse.host:8123',// ...// });
// After (HeliosDB) - Only host changesconst client = createClient({ host: 'http://heliosdb.host:8123', // Change only this database: 'analytics', username: 'default', password: 'password', compression: { request: true, response: true, },});
async function queryEvents() { const result = await client.query({ query: ` SELECT event_type, count() AS cnt FROM events WHERE timestamp > now() - INTERVAL 1 DAY GROUP BY event_type ORDER BY cnt DESC `, format: 'JSONEachRow', });
const data = await result.json(); for (const row of data) { console.log(`${row.event_type}: ${row.cnt}`); }}
async function insertEvents() { const events = [ { timestamp: new Date(), event_type: 'click', user_id: 1, value: 10.5 }, { timestamp: new Date(), event_type: 'view', user_id: 2, value: 5.0 }, ];
await client.insert({ table: 'events', values: events, format: 'JSONEachRow', });}
queryEvents();10.5 Java (JDBC)
import java.sql.*;import java.util.Properties;
public class HeliosDBJDBC { public static void main(String[] args) throws SQLException { // Before (ClickHouse) // String url = "jdbc:clickhouse://clickhouse.host:8123/analytics";
// After (HeliosDB) - Only host changes String url = "jdbc:clickhouse://heliosdb.host:8123/analytics";
Properties props = new Properties(); props.setProperty("user", "default"); props.setProperty("password", "password"); props.setProperty("compress", "true");
try (Connection conn = DriverManager.getConnection(url, props); Statement stmt = conn.createStatement(); ResultSet rs = stmt.executeQuery( "SELECT event_type, count() AS cnt " + "FROM events " + "WHERE timestamp > now() - INTERVAL 1 DAY " + "GROUP BY event_type" )) {
while (rs.next()) { System.out.printf("%s: %d%n", rs.getString("event_type"), rs.getLong("cnt")); } } }}10.6 Java (Native Client)
import com.clickhouse.client.*;import com.clickhouse.data.*;
public class HeliosDBNative { public static void main(String[] args) { // Before (ClickHouse) // ClickHouseNode server = ClickHouseNode.of("clickhouse.host:9000");
// After (HeliosDB) - Only host changes ClickHouseNode server = ClickHouseNode.builder() .host("heliosdb.host") // Change only this .port(9000) .database("analytics") .credentials(ClickHouseCredentials.fromUserAndPassword("default", "password")) .build();
try (ClickHouseClient client = ClickHouseClient.newInstance(ClickHouseProtocol.NATIVE); ClickHouseResponse response = client.read(server) .query("SELECT event_type, count() FROM events GROUP BY event_type") .executeAndWait()) {
for (ClickHouseRecord record : response.records()) { System.out.printf("%s: %d%n", record.getValue(0).asString(), record.getValue(1).asLong()); } } }}10.7 HTTP API (curl)
# Before (ClickHouse)# curl 'http://clickhouse.host:8123/' --data-binary "SELECT 1"
# After (HeliosDB) - Only host changescurl 'http://heliosdb.host:8123/' \ --data-binary "SELECT event_type, count() FROM events GROUP BY event_type"
# With authenticationcurl 'http://heliosdb.host:8123/?user=default&password=password' \ --data-binary "SELECT * FROM events LIMIT 10"
# With databasecurl 'http://heliosdb.host:8123/?database=analytics' \ --data-binary "SELECT count() FROM events"
# JSON outputcurl 'http://heliosdb.host:8123/' \ --data-binary "SELECT * FROM events FORMAT JSONEachRow"
# Insert datacurl 'http://heliosdb.host:8123/?query=INSERT%20INTO%20events%20FORMAT%20JSONEachRow' \ --data-binary '{"timestamp":"2025-01-15 10:30:00","event_type":"click","user_id":1,"value":10.5}'10.8 Environment Variable Configuration
# Before (ClickHouse)export CLICKHOUSE_HOST=clickhouse.host
# After (HeliosDB)export CLICKHOUSE_HOST=heliosdb.host
# Common environment variablesexport CLICKHOUSE_PORT=9000export CLICKHOUSE_HTTP_PORT=8123export CLICKHOUSE_USER=defaultexport CLICKHOUSE_PASSWORD=passwordexport CLICKHOUSE_DATABASE=analytics11. Performance Considerations
11.1 HeliosDB Performance Configuration
# heliosdb.toml - ClickHouse protocol optimization
[clickhouse]# Network settingslisten_address = "0.0.0.0"native_port = 9000http_port = 8123max_connections = 4096
[clickhouse.protocol]# Enable compressioncompression_enabled = truecompression_algorithms = ["lz4", "zstd"]default_compression = "lz4"
# Increase buffer sizesmax_query_size = 268435456 # 256MBmax_insert_block_size = 1048576 # 1M rows
[clickhouse.query]# Query optimizationmax_threads = 16max_execution_time = 600max_memory_usage = 10737418240 # 10GBuse_uncompressed_cache = truebackground_pool_size = 16
[clickhouse.mergetree]# MergeTree settingsindex_granularity = 8192min_bytes_for_wide_part = 10485760 # 10MBmax_parts_in_total = 100000merge_max_block_size = 819211.2 Query Performance Tips
Use PREWHERE
-- PREWHERE filters before reading columnsSELECT user_id, event_type, valueFROM eventsPREWHERE timestamp > '2025-01-01' -- Filter earlyWHERE event_type = 'purchase'; -- Filter lateOptimize ORDER BY
-- Design ORDER BY to match common query patternsCREATE TABLE events ( timestamp DateTime, user_id UInt32, event_type String, value Float64) ENGINE = MergeTree()-- If you query by user_id and timestamp:ORDER BY (user_id, timestamp)PARTITION BY toYYYYMM(timestamp);
-- Queries matching ORDER BY are fastSELECT * FROM eventsWHERE user_id = 12345 AND timestamp > '2025-01-01';Use Sampling
-- Approximate queries on samplesSELECT event_type, count() * 10 AS estimated_countFROM events SAMPLE 0.1GROUP BY event_type;
-- Deterministic samplingSELECT event_type, count() * 100 AS estimated_countFROM events SAMPLE 0.01 OFFSET 0.5GROUP BY event_type;Batch Inserts
# Insert in batches of 10K-100K rowsbatch_size = 50000for i in range(0, len(data), batch_size): batch = data[i:i+batch_size] client.execute( 'INSERT INTO events VALUES', batch, types_check=True )11.3 Index Optimization
-- Skip indexes for filteringCREATE TABLE logs ( timestamp DateTime, level String, message String, INDEX idx_level level TYPE set(100) GRANULARITY 4, INDEX idx_message message TYPE tokenbf_v1(10240, 3, 0) GRANULARITY 4) ENGINE = MergeTree()ORDER BY timestamp;
-- Bloom filter index for high-cardinality columnsCREATE TABLE events ( timestamp DateTime, user_id UInt32, session_id UUID, INDEX idx_session session_id TYPE bloom_filter(0.01) GRANULARITY 1) ENGINE = MergeTree()ORDER BY (timestamp, user_id);11.4 Materialized Views for Pre-aggregation
-- Pre-aggregate frequently-queried metricsCREATE MATERIALIZED VIEW hourly_metricsENGINE = AggregatingMergeTree()ORDER BY (hour, event_type)AS SELECT toStartOfHour(timestamp) AS hour, event_type, countState() AS count_state, sumState(value) AS sum_state, avgState(value) AS avg_stateFROM eventsGROUP BY hour, event_type;
-- Query from materialized view (much faster)SELECT hour, event_type, countMerge(count_state) AS total, sumMerge(sum_state) AS sum, avgMerge(avg_state) AS avgFROM hourly_metricsWHERE hour >= toStartOfDay(now())GROUP BY hour, event_type;11.5 Performance Benchmarks
| Operation | ClickHouse | HeliosDB | Notes |
|---|---|---|---|
| count() 1B rows | 85ms | 90ms | Comparable |
| GROUP BY 100M rows | 2.1s | 2.0s | Slightly faster |
| Bulk insert 1M rows | 0.65s | 0.67s | Comparable |
| Complex JOIN | 3.5s | 3.2s | Better optimizer |
| Window functions | 4.2s | 3.8s | SIMD acceleration |
11.6 Monitoring Performance
-- HeliosDB performance viewsSELECT * FROM system.query_logWHERE type = 'QueryFinish' AND query_duration_ms > 1000ORDER BY query_duration_ms DESCLIMIT 20;
-- Check table sizesSELECT database, name, formatReadableSize(total_bytes) AS size, formatReadableQuantity(total_rows) AS rowsFROM system.tablesWHERE database = 'analytics'ORDER BY total_bytes DESC;
-- Check part statusSELECT table, partition, count() AS parts, sum(rows) AS rows, formatReadableSize(sum(bytes)) AS sizeFROM system.partsWHERE database = 'analytics' AND activeGROUP BY table, partitionORDER BY parts DESC;12. Troubleshooting Common Issues
12.1 Connection Issues
Issue: Connection Refused
Code: 210. DB::NetException: Connection refusedSolutions:
# Check HeliosDB is runningsystemctl status heliosdb
# Verify port is listeningnetstat -tlnp | grep -E '9000|8123'
# Check firewallfirewall-cmd --list-ports | grep -E '9000|8123'
# Test connectivitync -zv heliosdb.host 9000nc -zv heliosdb.host 8123Issue: Authentication Failed
Code: 516. DB::Exception: Authentication failedSolutions:
# Ensure credentials are correctclient = Client( host='heliosdb.host', port=9000, user='default', password='correct_password', # Verify password database='analytics')12.2 Query Issues
Issue: Memory Limit Exceeded
Code: 241. DB::Exception: Memory limit exceededSolutions:
-- Increase memory limit for querySET max_memory_usage = 20000000000; -- 20GB
-- Or use samplingSELECT ... FROM events SAMPLE 0.1 ...
-- Or break into smaller queriesSELECT ... FROM events WHERE timestamp >= '2025-01-01' AND timestamp < '2025-01-15';SELECT ... FROM events WHERE timestamp >= '2025-01-15' AND timestamp < '2025-02-01';Issue: Query Timeout
Code: 159. DB::Exception: Timeout exceededSolutions:
# Increase timeoutclient = Client( host='heliosdb.host', port=9000, settings={ 'max_execution_time': 600, # 10 minutes 'send_receive_timeout': 600 })Issue: Too Many Parts
Code: 252. DB::Exception: Too many partsSolutions:
-- Check part countsSELECT table, count() AS partsFROM system.partsWHERE database = 'analytics' AND activeGROUP BY tableORDER BY parts DESC;
-- Optimize tablesOPTIMIZE TABLE analytics.events FINAL;
-- Adjust settingsALTER TABLE events MODIFY SETTING parts_to_delay_insert = 500, parts_to_throw_insert = 600;12.3 Data Type Issues
Issue: Type Mismatch
Code: 53. DB::Exception: Type mismatchSolutions:
# Ensure proper type conversionfrom datetime import datetime
# Correct typesdata = [ (datetime.now(), 'click', 123, 10.5), # DateTime, String, UInt32, Float64]
client.execute( 'INSERT INTO events (timestamp, event_type, user_id, value) VALUES', data, types_check=True # Enable type checking)Issue: Nullable Type Errors
Code: 349. DB::Exception: Cannot insert NULL valueSolutions:
-- Check column nullabilityDESCRIBE TABLE events;
-- Alter column to NullableALTER TABLE events MODIFY COLUMN optional_field Nullable(String);
-- Or handle NULLs in insertINSERT INTO events SELECT timestamp, event_type, COALESCE(user_id, 0) AS user_id, -- Default for NULL valueFROM source_table;12.4 Migration Issues
Issue: Schema Incompatibility
Solutions:
# Check for ReplicatedMergeTree enginesgrep -i "Replicated" schema_export.sql
# Convert to standard enginessed -i 's/ReplicatedMergeTree([^)]*)/MergeTree()/g' schema_export.sqlIssue: Data Loss During Migration
Solutions:
-- Verify row counts before and after-- SourceSELECT count() FROM events; -- On ClickHouse
-- TargetSELECT count() FROM events; -- On HeliosDB
-- If mismatch, check for errors during migration-- Re-run migration with smaller batchesIssue: Performance Degradation
Solutions:
-- Rebuild indexes after migrationOPTIMIZE TABLE events FINAL;
-- Update statistics-- HeliosDB handles this automatically
-- Check query plansEXPLAIN SELECT * FROM events WHERE user_id = 12345;12.5 Materialized View Issues
Issue: View Not Updating
Solutions:
-- Check if view is attachedSELECT name, is_populated FROM system.tables WHERE engine = 'MaterializedView';
-- Detach and reattachDETACH TABLE events_daily_mv;ATTACH TABLE events_daily_mv;
-- Or recreate with POPULATEDROP TABLE events_daily_mv;CREATE MATERIALIZED VIEW events_daily_mv TO events_dailyPOPULATEAS SELECT ...;12.6 TTL Issues
Issue: Data Not Expiring
Solutions:
-- Check TTL settingsSELECT name, engine_full FROM system.tables WHERE name = 'events';
-- Force TTL processingALTER TABLE events MATERIALIZE TTL;
-- Check TTL statusSELECT table, partition, rows, delete_ttl_infoFROM system.partsWHERE table = 'events' AND active;13. Post-Migration Validation
13.1 Row Count Verification
#!/usr/bin/env python3"""Validate row counts between ClickHouse and HeliosDB."""
from clickhouse_driver import Client
def validate_counts(source_host, target_host, database, tables): source = Client(host=source_host, port=9000, database=database) target = Client(host=target_host, port=9000, database=database)
results = {} for table in tables: source_count = source.execute(f'SELECT count() FROM {table}')[0][0] target_count = target.execute(f'SELECT count() FROM {table}')[0][0]
match = source_count == target_count results[table] = { 'source': source_count, 'target': target_count, 'match': match }
status = "OK" if match else "MISMATCH" print(f"{table}: Source={source_count:,} Target={target_count:,} [{status}]")
return results
# Validatetables = ['events', 'users', 'metrics']validate_counts('clickhouse.host', 'heliosdb.host', 'analytics', tables)13.2 Schema Validation
-- Compare schemas-- On ClickHouse:SHOW CREATE TABLE analytics.events;
-- On HeliosDB:SHOW CREATE TABLE analytics.events;
-- Check column typesSELECT name, type, default_kind, default_expressionFROM system.columnsWHERE database = 'analytics' AND table = 'events';13.3 Data Integrity Checks
def validate_sample_data(source_host, target_host, database, table, key_column, sample_size=1000): """Compare sample rows between systems.""" source = Client(host=source_host, port=9000, database=database) target = Client(host=target_host, port=9000, database=database)
# Get sample keys keys = source.execute(f''' SELECT DISTINCT {key_column} FROM {table} ORDER BY rand() LIMIT {sample_size} ''')
mismatches = 0 for (key,) in keys: source_row = source.execute( f'SELECT * FROM {table} WHERE {key_column} = %(key)s', {'key': key} ) target_row = target.execute( f'SELECT * FROM {table} WHERE {key_column} = %(key)s', {'key': key} )
if source_row != target_row: mismatches += 1 print(f"Mismatch for {key_column}={key}")
print(f"Validation: {sample_size - mismatches}/{sample_size} rows match") return mismatches == 013.4 Query Performance Comparison
import time
def compare_query_performance(source_host, target_host, database, queries): """Compare query execution times.""" source = Client(host=source_host, port=9000, database=database) target = Client(host=target_host, port=9000, database=database)
for query in queries: # Source timing start = time.time() source.execute(query) source_time = time.time() - start
# Target timing start = time.time() target.execute(query) target_time = time.time() - start
diff = ((target_time - source_time) / source_time) * 100 status = "faster" if diff < 0 else "slower"
print(f"Query: {query[:50]}...") print(f" ClickHouse: {source_time:.3f}s") print(f" HeliosDB: {target_time:.3f}s ({abs(diff):.1f}% {status})")
# Comparequeries = [ "SELECT count() FROM events", "SELECT event_type, count() FROM events GROUP BY event_type", "SELECT user_id, sum(value) FROM events GROUP BY user_id ORDER BY sum(value) DESC LIMIT 100",]compare_query_performance('clickhouse.host', 'heliosdb.host', 'analytics', queries)13.5 Application Smoke Tests
- All application endpoints responding correctly
- Query latencies within acceptable range (< 10% variance)
- No increase in error rates
- Dashboard visualizations rendering correctly
- Batch jobs completing successfully
- Materialized views updating correctly
- TTL expiration working as expected
- INSERT operations completing successfully
- Complex aggregations returning correct results
13.6 Automated Validation Script
#!/bin/bash# Post-migration validation script
SOURCE_HOST="clickhouse.host"TARGET_HOST="heliosdb.host"DATABASE="analytics"
echo "=== Post-Migration Validation ==="
# 1. Connectivity testecho "Testing connectivity..."clickhouse-client --host $TARGET_HOST --query "SELECT 1" || exit 1
# 2. Compare table countsecho "Comparing row counts..."for table in events users metrics; do source_count=$(clickhouse-client --host $SOURCE_HOST \ --query "SELECT count() FROM $DATABASE.$table") target_count=$(clickhouse-client --host $TARGET_HOST \ --query "SELECT count() FROM $DATABASE.$table")
if [ "$source_count" == "$target_count" ]; then echo "$table: OK ($source_count rows)" else echo "$table: MISMATCH (source=$source_count, target=$target_count)" fidone
# 3. Query performanceecho "Testing query performance..."for i in 1 2 3; do time clickhouse-client --host $TARGET_HOST \ --query "SELECT event_type, count() FROM $DATABASE.events GROUP BY event_type" \ > /dev/nulldone
# 4. Test materialized viewsecho "Testing materialized views..."clickhouse-client --host $TARGET_HOST \ --query "SELECT name FROM system.tables WHERE engine = 'MaterializedView'"
echo "=== Validation Complete ==="Appendix
A. Quick Reference Commands
# Connect to HeliosDBclickhouse-client --host heliosdb.host --port 9000
# Export schemaclickhouse-client --host clickhouse.host \ --query "SELECT create_table_query FROM system.tables WHERE database = 'analytics'"
# Export data (Native format)clickhouse-client --host clickhouse.host \ --query "SELECT * FROM analytics.events FORMAT Native" > events.native
# Import dataclickhouse-client --host heliosdb.host \ --query "INSERT INTO analytics.events FORMAT Native" < events.native
# Verify countsclickhouse-client --host heliosdb.host \ --query "SELECT count() FROM analytics.events"B. Migration Checklist Summary
Pre-Migration
- Backup ClickHouse data
- Document cluster topology
- Inventory tables and engines
- Analyze data volume
- Review query patterns
- Plan migration window
Migration
- Export schemas
- Convert Replicated* engines
- Import schemas to HeliosDB
- Migrate data (choose method)
- Update application connections
Post-Migration
- Verify row counts
- Check schema integrity
- Validate sample data
- Compare query performance
- Run application tests
- Monitor error rates
- Optimize tables
C. Sample Migration Timeline
| Phase | Duration | Activities |
|---|---|---|
| Assessment | 1-2 days | Inventory, engine review, sizing |
| Preparation | 1-3 days | HeliosDB setup, schema conversion |
| Schema Migration | 1-2 hours | Export, convert, import |
| Data Migration | Varies | Depends on data size |
| Validation | 1-2 days | Testing, performance comparison |
| Cutover | 1-4 hours | Connection updates, final sync |
| Monitoring | 1-2 weeks | Performance monitoring |
D. Compatibility Notes
Fully Compatible Features
- Native TCP protocol (port 9000)
- HTTP protocol (port 8123)
- All MergeTree family engines
- All standard data types
- Materialized views
- Dictionaries
- TTL
- PREWHERE/WHERE/FINAL
- All JOIN types
- Window functions
- Aggregate functions
Migration Required
- ReplicatedMergeTree -> MergeTree (HeliosDB handles replication)
- ZooKeeper paths -> HeliosDB replication config
- Cluster settings -> HeliosDB cluster config
Behavioral Differences
- Replication: Uses HeliosDB native replication
- Storage: Uses HeliosDB storage engine
- Keeper: Not required (no ZooKeeper dependency)
- Some advanced settings may differ
Related Documentation
- ClickHouse Protocol README
- ClickHouse Compatibility Matrix
- ClickHouse Configuration
- ClickHouse Examples
- General Migration Guide
- Protocol Compatibility Matrix
- PostgreSQL Migration Guide
- Cassandra Migration Guide
Need Help?
- Documentation: docs.heliosdb.io
- Community: community.heliosdb.io
- Support: support@heliosdb.io
Document Version History:
| Version | Date | Changes |
|---|---|---|
| 1.0 | January 2026 | Initial release |