MongoDB to HeliosDB Migration Guide
MongoDB to HeliosDB Migration Guide
Version: 1.0 Last Updated: 2026-01-04
Table of Contents
- Introduction
- Compatibility Overview
- Pre-Migration Checklist
- Conceptual Mapping
- Step-by-Step Migration Process
- Query Translation Examples
- Feature Mapping
- Post-Migration Validation
- Performance Considerations
- Common Issues and Troubleshooting
1. Introduction
1.1 Why Migrate from MongoDB to HeliosDB?
HeliosDB offers significant advantages over MongoDB while maintaining wire protocol compatibility:
| Aspect | MongoDB | HeliosDB |
|---|---|---|
| Data Model | Document-only | Unified multi-model (Document, Relational, Graph, Vector, Time-Series) |
| Query Language | MongoDB Query Language only | MQL, SQL, GraphQL, CQL, Redis commands |
| Transactions | Snapshot isolation | SERIALIZABLE isolation with zero-maintenance MVCC |
| Joins | $lookup only | Native SQL JOINs + $lookup |
| Analytics | Limited aggregation | Full OLAP + OLTP (HTAP) |
| Full-Text Search | External (Atlas Search) | Built-in with GraphRAG integration |
| Change Streams | Standard | CDC-backed with lower latency |
| Vector Search | Atlas Vector Search | Native HNSW with multi-modal embeddings |
| Licensing | SSPL | Enterprise-friendly |
| Sharding | Manual configuration | Transparent horizontal scaling |
1.2 Migration Benefits
- Unified Platform: Consolidate multiple database systems into one
- SQL Access: Query MongoDB-style documents with standard SQL
- Enhanced Analytics: Run complex analytical queries on document data
- Better Transactions: ACID guarantees with serializable isolation
- Cost Reduction: Single database license, reduced operational overhead
- GraphRAG Integration: AI-powered semantic search on document content
1.3 Wire Protocol Compatibility
HeliosDB implements the MongoDB wire protocol (OP_MSG), allowing existing MongoDB applications to connect without code changes:
# Existing MongoDB code works unchangedfrom pymongo import MongoClient
# Simply change the hostnameclient = MongoClient("mongodb://heliosdb-server:27017/")db = client.my_databasecollection = db.my_collection
# All operations work as expectedresult = collection.find({"status": "active"})2. Compatibility Overview
2.1 Wire Protocol Support
| Protocol Feature | Status | Notes |
|---|---|---|
| OP_MSG | 100% | Modern protocol (MongoDB 3.6+) |
| OP_QUERY | 100% | Legacy support |
| OP_INSERT/UPDATE/DELETE | 100% | Legacy operations |
| BSON Encoding | 100% | All 19 BSON types |
| SCRAM-SHA-256 | 100% | Recommended authentication |
| TLS/SSL | 100% | Encrypted connections |
2.2 CRUD Operations
| Operation | Compatibility | Notes |
|---|---|---|
| insertOne/insertMany | 100% | Full support |
| find/findOne | 100% | All query operators |
| updateOne/updateMany | 100% | All update operators |
| deleteOne/deleteMany | 100% | Full support |
| replaceOne | 100% | Document replacement |
| bulkWrite | 100% | Ordered and unordered |
2.3 Query Operators
| Category | Coverage | Examples |
|---|---|---|
| Comparison | 100% | $eq, $gt, $lt, $in, $nin |
| Logical | 100% | $and, $or, $not, $nor |
| Element | 100% | $exists, $type |
| Evaluation | 100% | $regex, $text, $expr |
| Array | 100% | $all, $elemMatch, $size |
| Geospatial | 100% | $near, $geoWithin |
| Bitwise | 100% | $bitsAllSet, $bitsAnyClear |
2.4 Aggregation Pipeline
| Stage | Status | Notes |
|---|---|---|
| $match | Supported | Filter documents |
| $project | Supported | Reshape documents |
| $group | Supported | Group with accumulators |
| $sort | Supported | Multi-field sorting |
| $lookup | Supported | Left outer join |
| $graphLookup | Supported | Recursive lookup |
| $unwind | Supported | Deconstruct arrays |
| $facet | Supported | Multi-faceted results |
| $setWindowFields | Supported | Window functions |
| $merge | Supported | Write to collection |
2.5 Known Limitations
| Feature | Status | Alternative |
|---|---|---|
| GridFS | Not supported | Use blob storage or S3 integration |
| Capped collections | Not supported | Use TTL indexes |
| Server-side JavaScript | Not supported | Use SQL procedures |
| Map-reduce | Deprecated | Use aggregation pipeline |
3. Pre-Migration Checklist
3.1 Assessment Phase
-
Inventory databases and collections
Terminal window # List all databasesmongosh --eval "db.adminCommand('listDatabases')"# List collections in each databasemongosh mydb --eval "db.getCollectionNames()" -
Document data volume
// Get collection statisticsdb.collection.stats()// Count documentsdb.collection.countDocuments({}) -
Identify GridFS usage
// Check for GridFS collectionsdb.getCollectionNames().filter(n => n.endsWith('.files') || n.endsWith('.chunks')) -
Review index definitions
// List all indexesdb.collection.getIndexes() -
Analyze query patterns
// Enable profilerdb.setProfilingLevel(1, { slowms: 100 })// Review slow queriesdb.system.profile.find().sort({ ts: -1 }).limit(20)
3.2 Technical Requirements
- HeliosDB version: 7.0 or later recommended
- Storage capacity: 1.5x current MongoDB storage (for import headroom)
- Network connectivity: MongoDB port 27017 accessible
- Authentication: Prepare HeliosDB credentials
- Backup verified: Recent MongoDB backup tested for restore
3.3 Application Assessment
- Driver versions: Document current MongoDB driver versions
- Connection strings: Inventory all application connection strings
- Custom code: Review any server-side JavaScript or map-reduce usage
- Change stream consumers: Document change stream applications
- Test environment: Prepare isolated test environment
3.4 Risk Assessment
| Risk | Mitigation |
|---|---|
| Data loss | Comprehensive backup before migration |
| Application downtime | Plan maintenance window or dual-write strategy |
| Query incompatibility | Pre-test all application queries |
| Performance regression | Benchmark before and after |
4. Conceptual Mapping
4.1 Collections to Tables
MongoDB collections map to HeliosDB tables with JSONB storage:
MongoDB Collection:
// MongoDB: users collection{ "_id": ObjectId("507f1f77bcf86cd799439011"), "username": "johndoe", "email": "john@example.com", "profile": { "firstName": "John", "lastName": "Doe", "age": 30 }, "tags": ["developer", "mongodb"]}HeliosDB Table (Internal Representation):
-- Automatic table structure created by MongoDB protocolCREATE TABLE users ( _id UUID PRIMARY KEY DEFAULT gen_random_uuid(), data JSONB NOT NULL);
-- The document is stored in the 'data' JSONB column-- _id is extracted and indexed for fast lookupsHybrid Schema (Optional Optimization):
-- For frequently queried fields, create hybrid schemaCREATE TABLE users ( _id UUID PRIMARY KEY, username VARCHAR(100), email VARCHAR(255), profile JSONB, tags TEXT[], created_at TIMESTAMP DEFAULT NOW());
-- Create indexes on extracted fieldsCREATE INDEX idx_users_username ON users(username);CREATE INDEX idx_users_email ON users(email);CREATE INDEX idx_users_tags ON users USING GIN(tags);4.2 Documents to Rows with JSONB
| MongoDB Concept | HeliosDB Equivalent |
|---|---|
| Document | Row with JSONB column |
| Embedded document | Nested JSONB object |
| Array | JSONB array or PostgreSQL array |
| ObjectId | UUID (auto-converted) |
| ISODate | TIMESTAMP/TIMESTAMPTZ |
| NumberLong | BIGINT |
| NumberDecimal | NUMERIC |
| BinData | BYTEA |
Type Conversion Examples:
// MongoDB document with various types{ "_id": ObjectId("507f1f77bcf86cd799439011"), "price": NumberDecimal("19.99"), "quantity": NumberLong(100), "created": ISODate("2025-01-01T00:00:00Z"), "metadata": BinData(0, "base64data=="), "location": { "type": "Point", "coordinates": [-73.97, 40.77] }}-- HeliosDB JSONB representation{ "_id": "507f1f77-bcf8-6cd7-9943-9011507f1f77", "price": 19.99, "quantity": 100, "created": "2025-01-01T00:00:00Z", "metadata": "\\x626173653634646174613d3d", "location": { "type": "Point", "coordinates": [-73.97, 40.77] }}4.3 Indexes
| MongoDB Index | HeliosDB Equivalent |
|---|---|
| Single field | B-tree index on JSONB path |
| Compound | Composite B-tree index |
| Unique | Unique constraint/index |
| TTL | TTL extension on JSONB |
| Text | GIN index with full-text search |
| 2dsphere | GiST geospatial index |
| Hashed | Hash index |
| Wildcard | GIN index on JSONB |
Index Migration Examples:
// MongoDB indexesdb.users.createIndex({ "email": 1 }, { unique: true })db.users.createIndex({ "profile.age": 1, "status": 1 })db.users.createIndex({ "location": "2dsphere" })db.users.createIndex({ "content": "text", "title": "text" })db.users.createIndex({ "createdAt": 1 }, { expireAfterSeconds: 3600 })-- HeliosDB equivalent indexesCREATE UNIQUE INDEX idx_users_email ON users ((data->>'email'));CREATE INDEX idx_users_age_status ON users ((data->'profile'->>'age'), (data->>'status'));CREATE INDEX idx_users_location ON users USING GIST ((data->'location'));CREATE INDEX idx_users_content ON users USING GIN (to_tsvector('english', COALESCE(data->>'content', '') || ' ' || COALESCE(data->>'title', '')));-- TTL implemented via background job or HeliosDB TTL extension4.4 Database and Collection Namespaces
| MongoDB | HeliosDB |
|---|---|
| Database | Schema |
| Collection | Table |
| mydb.users | mydb.users (schema.table) |
| admin | heliosdb_admin (system schema) |
5. Step-by-Step Migration Process
5.1 Schema Design in HeliosDB
Option A: Pure Document Mode (Simplest)
-- HeliosDB automatically creates this structure-- when accessed via MongoDB protocolCREATE SCHEMA IF NOT EXISTS mydb;
CREATE TABLE mydb.users ( _id UUID PRIMARY KEY DEFAULT gen_random_uuid(), data JSONB NOT NULL);
CREATE INDEX idx_users_data ON mydb.users USING GIN (data jsonb_path_ops);Option B: Hybrid Schema (Recommended for Performance)
-- Design schema with frequently queried fields extractedCREATE TABLE mydb.users ( _id UUID PRIMARY KEY DEFAULT gen_random_uuid(), username VARCHAR(100) NOT NULL, email VARCHAR(255) UNIQUE NOT NULL, status VARCHAR(20) DEFAULT 'active', profile JSONB, tags TEXT[], created_at TIMESTAMPTZ DEFAULT NOW(), updated_at TIMESTAMPTZ DEFAULT NOW());
-- Indexes for common query patternsCREATE INDEX idx_users_status ON mydb.users(status);CREATE INDEX idx_users_created ON mydb.users(created_at DESC);CREATE INDEX idx_users_profile ON mydb.users USING GIN (profile);CREATE INDEX idx_users_tags ON mydb.users USING GIN (tags);Option C: Full Relational Conversion (Best for Analytics)
-- Normalize embedded documents into separate tablesCREATE TABLE mydb.users ( _id UUID PRIMARY KEY, username VARCHAR(100) NOT NULL, email VARCHAR(255) UNIQUE NOT NULL, status VARCHAR(20) DEFAULT 'active', created_at TIMESTAMPTZ DEFAULT NOW());
CREATE TABLE mydb.user_profiles ( user_id UUID PRIMARY KEY REFERENCES mydb.users(_id), first_name VARCHAR(100), last_name VARCHAR(100), age INTEGER, bio TEXT);
CREATE TABLE mydb.user_tags ( id SERIAL PRIMARY KEY, user_id UUID REFERENCES mydb.users(_id), tag VARCHAR(50));
CREATE INDEX idx_user_tags_user ON mydb.user_tags(user_id);CREATE INDEX idx_user_tags_tag ON mydb.user_tags(tag);5.2 Data Export Using mongodump/mongoexport
Method 1: mongodump (Binary, Recommended for Large Datasets)
# Export entire databasemongodump \ --uri="mongodb://source-mongodb:27017/mydb" \ --out=/backup/mongodb \ --gzip
# Export specific collectionmongodump \ --uri="mongodb://source-mongodb:27017/mydb" \ --collection=users \ --out=/backup/mongodb \ --gzip
# Export with query filter (incremental)mongodump \ --uri="mongodb://source-mongodb:27017/mydb" \ --collection=orders \ --query='{"createdAt": {"$gte": {"$date": "2025-01-01T00:00:00Z"}}}' \ --out=/backup/mongodbMethod 2: mongoexport (JSON/CSV, Good for Transformation)
# Export to JSON (extended JSON format)mongoexport \ --uri="mongodb://source-mongodb:27017/mydb" \ --collection=users \ --out=users.json \ --jsonArray
# Export to CSV (for relational import)mongoexport \ --uri="mongodb://source-mongodb:27017/mydb" \ --collection=users \ --type=csv \ --fields=_id,username,email,profile.firstName,profile.lastName \ --out=users.csv
# Export with query filtermongoexport \ --uri="mongodb://source-mongodb:27017/mydb" \ --collection=orders \ --query='{"status": "completed"}' \ --out=completed_orders.json5.3 Data Transformation and Import
Direct Import via mongorestore (Fastest)
# Import directly to HeliosDB (uses MongoDB wire protocol)mongorestore \ --uri="mongodb://heliosdb-server:27017/mydb" \ --gzip \ /backup/mongodb/mydb
# Import specific collectionmongorestore \ --uri="mongodb://heliosdb-server:27017/mydb" \ --collection=users \ --gzip \ /backup/mongodb/mydb/users.bson.gz
# Import with optionsmongorestore \ --uri="mongodb://heliosdb-server:27017/mydb" \ --drop \ --numInsertionWorkers=4 \ --batchSize=1000 \ /backup/mongodb/mydbJSON Import via SQL COPY
# Convert MongoDB extended JSON to standard JSONcat users.json | jq -c '.[] | {_id: ._id."$oid", data: .}' > users_transformed.json
# Import into HeliosDB using COPYpsql -h heliosdb-server -U admin -d mydb -c "COPY users (data) FROM STDIN WITH (FORMAT json);" < users_transformed.jsonPython Migration Script (Complex Transformations)
#!/usr/bin/env python3"""MongoDB to HeliosDB Migration ScriptHandles complex data transformations during migration"""
from pymongo import MongoClientfrom datetime import datetimeimport uuidimport json
# Source MongoDBsource_client = MongoClient("mongodb://source-mongodb:27017/")source_db = source_client.mydb
# Target HeliosDB (via MongoDB protocol)target_client = MongoClient("mongodb://heliosdb-server:27017/")target_db = target_client.mydb
def convert_objectid(oid): """Convert MongoDB ObjectId to UUID format""" hex_str = str(oid) # Pad to 32 characters for UUID padded = hex_str.ljust(32, '0') return uuid.UUID(padded[:8] + '-' + padded[8:12] + '-' + padded[12:16] + '-' + padded[16:20] + '-' + padded[20:32])
def transform_document(doc): """Transform document for HeliosDB compatibility""" transformed = {}
for key, value in doc.items(): if key == '_id': # Convert ObjectId to UUID string transformed['_id'] = str(convert_objectid(value)) elif isinstance(value, datetime): # Convert datetime to ISO string transformed[key] = value.isoformat() elif isinstance(value, dict): # Recursively transform nested documents transformed[key] = transform_document(value) elif isinstance(value, list): # Transform array elements transformed[key] = [ transform_document(v) if isinstance(v, dict) else v for v in value ] else: transformed[key] = value
return transformed
def migrate_collection(collection_name, batch_size=1000): """Migrate a single collection with batching""" source_collection = source_db[collection_name] target_collection = target_db[collection_name]
total = source_collection.count_documents({}) migrated = 0
print(f"Migrating {collection_name}: {total} documents")
cursor = source_collection.find().batch_size(batch_size) batch = []
for doc in cursor: transformed = transform_document(doc) batch.append(transformed)
if len(batch) >= batch_size: target_collection.insert_many(batch, ordered=False) migrated += len(batch) print(f" Progress: {migrated}/{total} ({100*migrated/total:.1f}%)") batch = []
# Insert remaining documents if batch: target_collection.insert_many(batch, ordered=False) migrated += len(batch)
print(f" Completed: {migrated} documents migrated") return migrated
def migrate_indexes(collection_name): """Recreate indexes in target collection""" source_collection = source_db[collection_name] target_collection = target_db[collection_name]
indexes = source_collection.index_information()
for name, info in indexes.items(): if name == '_id_': continue # Skip default _id index
keys = info['key'] options = {k: v for k, v in info.items() if k not in ('key', 'v', 'ns')}
try: target_collection.create_index(keys, name=name, **options) print(f" Created index: {name}") except Exception as e: print(f" Warning: Could not create index {name}: {e}")
def main(): collections = source_db.list_collection_names()
# Filter out system collections collections = [c for c in collections if not c.startswith('system.')]
print(f"Found {len(collections)} collections to migrate")
for collection_name in collections: migrate_collection(collection_name) migrate_indexes(collection_name)
print("\nMigration complete!")
if __name__ == "__main__": main()5.4 Application Connection Changes
Python (PyMongo)
# Before: MongoDBfrom pymongo import MongoClientclient = MongoClient("mongodb://mongodb-server:27017/")
# After: HeliosDB (same code, different hostname)from pymongo import MongoClientclient = MongoClient("mongodb://heliosdb-server:27017/")
# With authenticationclient = MongoClient( "mongodb://heliosdb-server:27017/", username="myuser", password="mypassword", authSource="admin", authMechanism="SCRAM-SHA-256")
# With connection pooling and optionsclient = MongoClient( "mongodb://heliosdb-server:27017/", maxPoolSize=50, minPoolSize=10, maxIdleTimeMS=30000, serverSelectionTimeoutMS=5000)Node.js (mongodb driver)
// Before: MongoDBconst { MongoClient } = require('mongodb');const client = new MongoClient('mongodb://mongodb-server:27017/');
// After: HeliosDBconst client = new MongoClient('mongodb://heliosdb-server:27017/', { auth: { username: 'myuser', password: 'mypassword' }, authSource: 'admin', maxPoolSize: 50});Java (MongoDB Driver)
// Before: MongoDBMongoClient client = MongoClients.create("mongodb://mongodb-server:27017/");
// After: HeliosDBMongoClientSettings settings = MongoClientSettings.builder() .applyConnectionString(new ConnectionString("mongodb://heliosdb-server:27017/")) .credential(MongoCredential.createScramSha256Credential( "myuser", "admin", "mypassword".toCharArray())) .build();MongoClient client = MongoClients.create(settings);Environment-Based Configuration
import os
# Use environment variable for easy switchingMONGODB_URI = os.environ.get( "MONGODB_URI", "mongodb://heliosdb-server:27017/")
client = MongoClient(MONGODB_URI)6. Query Translation Examples
6.1 find() to SELECT
Basic Find
// MongoDBdb.users.find({ status: "active" })-- HeliosDB SQL equivalentSELECT * FROM users WHERE data->>'status' = 'active';
-- Or with hybrid schemaSELECT * FROM users WHERE status = 'active';Find with Projection
// MongoDBdb.users.find( { status: "active" }, { username: 1, email: 1, _id: 0 })-- HeliosDB SQLSELECT data->>'username' AS username, data->>'email' AS emailFROM usersWHERE data->>'status' = 'active';Find with Operators
// MongoDB: comparison operatorsdb.orders.find({ amount: { $gte: 100, $lte: 500 }, status: { $in: ["pending", "processing"] }})-- HeliosDB SQLSELECT * FROM ordersWHERE (data->>'amount')::numeric BETWEEN 100 AND 500 AND data->>'status' IN ('pending', 'processing');Find with Nested Fields
// MongoDB: nested document querydb.users.find({ "profile.age": { $gte: 21 } })-- HeliosDB SQLSELECT * FROM usersWHERE (data->'profile'->>'age')::integer >= 21;
-- Using jsonpath (more efficient)SELECT * FROM usersWHERE data @@ '$.profile.age >= 21';Find with Arrays
// MongoDB: array queriesdb.users.find({ tags: "mongodb" }) // containsdb.users.find({ tags: { $all: ["mongodb", "developer"] } }) // contains alldb.users.find({ "items.price": { $gt: 100 } }) // array element match-- HeliosDB SQL: array containsSELECT * FROM users WHERE data->'tags' ? 'mongodb';
-- Contains allSELECT * FROM usersWHERE data->'tags' ?& array['mongodb', 'developer'];
-- Array element matchSELECT * FROM usersWHERE EXISTS ( SELECT 1 FROM jsonb_array_elements(data->'items') AS item WHERE (item->>'price')::numeric > 100);6.2 aggregate() to SQL with JSON Functions
Group and Count
// MongoDB aggregationdb.orders.aggregate([ { $match: { status: "completed" } }, { $group: { _id: "$category", total: { $sum: "$amount" }, count: { $sum: 1 }, avgAmount: { $avg: "$amount" } }}, { $sort: { total: -1 } }, { $limit: 10 }])-- HeliosDB SQLSELECT data->>'category' AS category, SUM((data->>'amount')::numeric) AS total, COUNT(*) AS count, AVG((data->>'amount')::numeric) AS avg_amountFROM ordersWHERE data->>'status' = 'completed'GROUP BY data->>'category'ORDER BY total DESCLIMIT 10;Lookup (Join)
// MongoDB $lookupdb.orders.aggregate([ { $lookup: { from: "customers", localField: "customer_id", foreignField: "_id", as: "customer" }}, { $unwind: "$customer" }])-- HeliosDB SQL (much simpler with native JOINs)SELECT o.*, c.data AS customerFROM orders oJOIN customers c ON o.data->>'customer_id' = c._id::text;
-- Or with JSONB aggregation to mimic $unwind resultSELECT o.data || jsonb_build_object('customer', c.data) AS documentFROM orders oJOIN customers c ON o.data->>'customer_id' = c._id::text;Faceted Search
// MongoDB $facetdb.products.aggregate([ { $facet: { byCategory: [ { $group: { _id: "$category", count: { $sum: 1 } } } ], priceRange: [ { $bucket: { groupBy: "$price", boundaries: [0, 50, 100, 500], default: "Other" }} ], totalCount: [ { $count: "count" } ] }}])-- HeliosDB SQL: Use separate queries or WITH clauseWITHby_category AS ( SELECT data->>'category' AS category, COUNT(*) AS count FROM products GROUP BY data->>'category'),price_range AS ( SELECT CASE WHEN (data->>'price')::numeric < 50 THEN '0-50' WHEN (data->>'price')::numeric < 100 THEN '50-100' WHEN (data->>'price')::numeric < 500 THEN '100-500' ELSE 'Other' END AS range, COUNT(*) AS count FROM products GROUP BY 1),total AS ( SELECT COUNT(*) AS count FROM products)SELECT jsonb_build_object( 'byCategory', (SELECT jsonb_agg(row_to_json(by_category)) FROM by_category), 'priceRange', (SELECT jsonb_agg(row_to_json(price_range)) FROM price_range), 'totalCount', (SELECT count FROM total) ) AS facets;6.3 update() to UPDATE with JSONB Operators
Basic Update
// MongoDBdb.users.updateOne( { _id: ObjectId("...") }, { $set: { status: "active" } })-- HeliosDB SQLUPDATE usersSET data = jsonb_set(data, '{status}', '"active"')WHERE _id = '...'::uuid;
-- With hybrid schemaUPDATE users SET status = 'active' WHERE _id = '...'::uuid;Increment
// MongoDBdb.products.updateMany( { category: "electronics" }, { $inc: { viewCount: 1, stock: -1 } })-- HeliosDB SQLUPDATE productsSET data = data || jsonb_build_object( 'viewCount', COALESCE((data->>'viewCount')::integer, 0) + 1, 'stock', COALESCE((data->>'stock')::integer, 0) - 1)WHERE data->>'category' = 'electronics';Array Push
// MongoDBdb.users.updateOne( { _id: ObjectId("...") }, { $push: { tags: "premium" } })-- HeliosDB SQLUPDATE usersSET data = jsonb_set( data, '{tags}', COALESCE(data->'tags', '[]'::jsonb) || '"premium"'::jsonb)WHERE _id = '...'::uuid;Array Pull
// MongoDBdb.users.updateOne( { _id: ObjectId("...") }, { $pull: { tags: "temporary" } })-- HeliosDB SQLUPDATE usersSET data = jsonb_set( data, '{tags}', (SELECT jsonb_agg(elem) FROM jsonb_array_elements(data->'tags') AS elem WHERE elem != '"temporary"'))WHERE _id = '...'::uuid;Upsert
// MongoDBdb.users.updateOne( { email: "new@example.com" }, { $set: { name: "New User", createdAt: new Date() } }, { upsert: true })-- HeliosDB SQLINSERT INTO users (data)VALUES ('{"email": "new@example.com", "name": "New User", "createdAt": "..."}'::jsonb)ON CONFLICT ((data->>'email'))DO UPDATE SET data = users.data || EXCLUDED.data;7. Feature Mapping
7.1 Aggregation Pipeline Equivalents
| MongoDB Stage | HeliosDB SQL Equivalent |
|---|---|
| $match | WHERE clause |
| $project | SELECT with JSON functions |
| $group | GROUP BY with aggregates |
| $sort | ORDER BY |
| $limit | LIMIT |
| $skip | OFFSET |
| $unwind | LATERAL jsonb_array_elements() |
| $lookup | JOIN |
| $graphLookup | WITH RECURSIVE |
| $bucket | CASE WHEN in GROUP BY |
| $facet | Multiple CTEs |
| $setWindowFields | Window functions |
7.2 Change Streams vs HeliosDB CDC
| Feature | MongoDB Change Streams | HeliosDB CDC |
|---|---|---|
| Protocol | Native MongoDB | Native + Debezium-compatible |
| Latency | ~100ms | <10ms |
| Resume | Token-based | LSN-based (more reliable) |
| Filtering | Pipeline stages | SQL predicates |
| Output | BSON | JSON, Avro, Protobuf |
| Destinations | Application code | Kafka, Webhooks, S3, etc. |
MongoDB Change Stream:
const pipeline = [ { $match: { operationType: "insert" } }];
const changeStream = collection.watch(pipeline);changeStream.on("change", (change) => { console.log(change);});HeliosDB CDC (via MongoDB protocol - identical API):
// Same API works with HeliosDBconst changeStream = collection.watch(pipeline);changeStream.on("change", (change) => { console.log(change);});HeliosDB CDC (direct SQL subscription):
-- Create CDC subscriptionCREATE PUBLICATION users_changes FOR TABLE users;
-- Subscribe via logical replicationCREATE SUBSCRIPTION users_sub CONNECTION 'host=heliosdb-server dbname=mydb' PUBLICATION users_changes;HeliosDB Webhook CDC:
-- Configure webhook notificationSELECT helios_cdc.create_webhook( 'users', 'https://myapp.com/webhooks/user-changes', '{"events": ["INSERT", "UPDATE", "DELETE"]}');7.3 Geospatial Queries
| MongoDB | HeliosDB |
|---|---|
| 2dsphere index | GiST index with geography |
| $near | ST_DWithin + ORDER BY ST_Distance |
| $geoWithin | ST_Within |
| $geoIntersects | ST_Intersects |
| GeoJSON | Native GeoJSON support |
MongoDB Geospatial:
// Create 2dsphere indexdb.places.createIndex({ location: "2dsphere" })
// Find near pointdb.places.find({ location: { $near: { $geometry: { type: "Point", coordinates: [-73.97, 40.77] }, $maxDistance: 1000 } }})
// Find within polygondb.places.find({ location: { $geoWithin: { $geometry: { type: "Polygon", coordinates: [[[-73.98, 40.76], [-73.96, 40.76], [-73.96, 40.78], [-73.98, 40.78], [-73.98, 40.76]]] } } }})HeliosDB SQL Geospatial:
-- Create spatial indexCREATE INDEX idx_places_location ON places USING GIST (ST_GeomFromGeoJSON(data->>'location'));
-- Find near point (within 1000 meters)SELECT * FROM placesWHERE ST_DWithin( ST_GeomFromGeoJSON(data->>'location')::geography, ST_SetSRID(ST_MakePoint(-73.97, 40.77), 4326)::geography, 1000)ORDER BY ST_Distance( ST_GeomFromGeoJSON(data->>'location')::geography, ST_SetSRID(ST_MakePoint(-73.97, 40.77), 4326)::geography);
-- Find within polygonSELECT * FROM placesWHERE ST_Within( ST_GeomFromGeoJSON(data->>'location'), ST_GeomFromGeoJSON('{ "type": "Polygon", "coordinates": [[[-73.98, 40.76], [-73.96, 40.76], [-73.96, 40.78], [-73.98, 40.78], [-73.98, 40.76]]] }'));7.4 Full-Text Search
| MongoDB | HeliosDB |
|---|---|
| Text index | GIN index with tsvector |
| $text search | to_tsvector/to_tsquery |
| Score sorting | ts_rank |
| Language support | 30+ languages |
| Stemming | Built-in |
MongoDB Text Search:
// Create text indexdb.articles.createIndex({ title: "text", content: "text" })
// Searchdb.articles.find( { $text: { $search: "mongodb database" } }, { score: { $meta: "textScore" } }).sort({ score: { $meta: "textScore" } })HeliosDB Full-Text Search:
-- Create full-text indexCREATE INDEX idx_articles_fts ON articles USING GIN (to_tsvector('english', COALESCE(data->>'title', '') || ' ' || COALESCE(data->>'content', '')));
-- Search with rankingSELECT *, ts_rank( to_tsvector('english', COALESCE(data->>'title', '') || ' ' || COALESCE(data->>'content', '')), plainto_tsquery('english', 'mongodb database') ) AS scoreFROM articlesWHERE to_tsvector('english', COALESCE(data->>'title', '') || ' ' || COALESCE(data->>'content', '')) @@ plainto_tsquery('english', 'mongodb database')ORDER BY score DESC;HeliosDB GraphRAG Semantic Search:
-- Enable semantic search (AI-powered)SELECT helios_graphrag.enable_semantic_search('articles', 'content');
-- Semantic searchSELECT * FROM helios_graphrag.semantic_search( 'articles', 'How do document databases handle transactions?', 10 -- top 10 results);8. Post-Migration Validation
8.1 Data Integrity Checks
// Compare document counts// Source MongoDBconst sourceCount = db.users.countDocuments({});console.log(`Source count: ${sourceCount}`);
// Target HeliosDBconst targetCount = db.users.countDocuments({});console.log(`Target count: ${targetCount}`);
// Verify counts matchif (sourceCount !== targetCount) { console.error(`Count mismatch! Source: ${sourceCount}, Target: ${targetCount}`);}def validate_migration(source_db, target_db, collection_name, sample_size=1000): """Validate migrated data integrity""" source = source_db[collection_name] target = target_db[collection_name]
# Count validation source_count = source.count_documents({}) target_count = target.count_documents({})
if source_count != target_count: print(f"ERROR: Count mismatch - Source: {source_count}, Target: {target_count}") return False
# Sample validation sample = list(source.aggregate([{"$sample": {"size": sample_size}}]))
for doc in sample: target_doc = target.find_one({"_id": doc["_id"]}) if target_doc is None: print(f"ERROR: Document {doc['_id']} not found in target") return False
# Compare key fields (adjust based on your schema) for key in ["username", "email", "status"]: if doc.get(key) != target_doc.get(key): print(f"ERROR: Field {key} mismatch for {doc['_id']}") return False
print(f"Validation passed: {source_count} documents verified") return True8.2 Query Validation
def validate_queries(source_db, target_db, collection_name): """Run test queries on both databases and compare results"""
test_queries = [ # Basic find {"status": "active"}, # Range query {"created_at": {"$gte": datetime(2025, 1, 1)}}, # Nested field {"profile.age": {"$gte": 21}}, # Array query {"tags": {"$in": ["premium", "vip"]}}, ]
source = source_db[collection_name] target = target_db[collection_name]
for query in test_queries: source_results = list(source.find(query).limit(100)) target_results = list(target.find(query).limit(100))
if len(source_results) != len(target_results): print(f"Query result count mismatch for {query}") print(f" Source: {len(source_results)}, Target: {len(target_results)}") else: print(f"Query {query}: OK ({len(source_results)} results)")8.3 Index Validation
// Compare indexesfunction compareIndexes(sourceDb, targetDb, collectionName) { const sourceIndexes = sourceDb[collectionName].getIndexes(); const targetIndexes = targetDb[collectionName].getIndexes();
const sourceNames = sourceIndexes.map(i => i.name).sort(); const targetNames = targetIndexes.map(i => i.name).sort();
console.log("Source indexes:", sourceNames); console.log("Target indexes:", targetNames);
// Check for missing indexes for (const name of sourceNames) { if (!targetNames.includes(name)) { console.log(`WARNING: Index ${name} missing in target`); } }}8.4 Performance Validation
import time
def benchmark_queries(db, collection_name, iterations=100): """Benchmark common query patterns""" collection = db[collection_name]
benchmarks = { "find_by_id": lambda: collection.find_one({"_id": sample_id}), "find_by_index": lambda: list(collection.find({"status": "active"}).limit(10)), "aggregation": lambda: list(collection.aggregate([ {"$match": {"status": "active"}}, {"$group": {"_id": "$category", "count": {"$sum": 1}}} ])), "text_search": lambda: list(collection.find( {"$text": {"$search": "search terms"}} ).limit(10)), }
results = {} for name, query_func in benchmarks.items(): times = [] for _ in range(iterations): start = time.time() query_func() times.append(time.time() - start)
results[name] = { "avg_ms": sum(times) / len(times) * 1000, "min_ms": min(times) * 1000, "max_ms": max(times) * 1000 } print(f"{name}: avg={results[name]['avg_ms']:.2f}ms")
return results8.5 Application Testing Checklist
- Unit tests pass against HeliosDB
- Integration tests pass with new connection strings
- CRUD operations work correctly
- Aggregation pipelines return expected results
- Change streams receive events
- Transactions commit and rollback properly
- Authentication works with all user accounts
- Connection pooling handles load correctly
- Error handling catches expected exceptions
9. Performance Considerations
9.1 Index Optimization
-- Analyze query patternsEXPLAIN ANALYZE SELECT * FROM users WHERE data->>'status' = 'active';
-- Create expression indexes for frequent queriesCREATE INDEX idx_users_status ON users ((data->>'status'));CREATE INDEX idx_users_email ON users ((data->>'email'));
-- Create GIN index for flexible JSON queriesCREATE INDEX idx_users_data ON users USING GIN (data jsonb_path_ops);
-- For range queries on numeric fieldsCREATE INDEX idx_orders_amount ON orders (((data->>'amount')::numeric));
-- Composite index for multi-field queriesCREATE INDEX idx_orders_status_date ON orders ( (data->>'status'), ((data->>'created_at')::timestamptz));9.2 Query Optimization
-- Use JSONB containment for exact matches (uses GIN index)SELECT * FROM users WHERE data @> '{"status": "active"}';
-- Use jsonpath for complex queriesSELECT * FROM usersWHERE data @@ '$.profile.age > 21 && $.tags[*] == "premium"';
-- Avoid full table scans with proper indexes-- BAD: No index usageSELECT * FROM users WHERE data->>'name' ILIKE '%john%';
-- GOOD: Use full-text searchSELECT * FROM usersWHERE to_tsvector('english', data->>'name') @@ to_tsquery('john');9.3 Bulk Operations
# Batch inserts for better performancedef bulk_insert(collection, documents, batch_size=1000): """Insert documents in batches for optimal performance""" for i in range(0, len(documents), batch_size): batch = documents[i:i + batch_size] collection.insert_many(batch, ordered=False) print(f"Inserted batch {i//batch_size + 1}")-- Use COPY for large data imports (SQL mode)COPY users (data) FROM '/path/to/data.json' WITH (FORMAT json);
-- Disable indexes during bulk load, rebuild afterALTER INDEX idx_users_data DISABLE;-- ... bulk insert ...REINDEX INDEX idx_users_data;9.4 Connection Pooling
# Configure connection poolingfrom pymongo import MongoClient
client = MongoClient( "mongodb://heliosdb-server:27017/", maxPoolSize=100, # Maximum connections minPoolSize=10, # Minimum connections to maintain maxIdleTimeMS=30000, # Close idle connections after 30s waitQueueTimeoutMS=5000, # Timeout waiting for connection socketTimeoutMS=30000, # Socket operation timeout serverSelectionTimeoutMS=5000 # Server selection timeout)9.5 Memory and Cache Configuration
-- Configure HeliosDB for document workloadsALTER SYSTEM SET work_mem = '256MB'; -- For complex aggregationsALTER SYSTEM SET shared_buffers = '8GB'; -- For cachingALTER SYSTEM SET effective_cache_size = '24GB'; -- Query planner hint
-- Enable intelligent caching for hot collectionsALTER TABLE users ENABLE INTELLIGENT_CACHING;10. Common Issues and Troubleshooting
10.1 Connection Issues
Problem: Connection refused
pymongo.errors.ServerSelectionTimeoutError: heliosdb-server:27017: [Errno 111] Connection refusedSolution:
# Check if HeliosDB is listening on MongoDB portnetstat -tlnp | grep 27017
# Verify MongoDB protocol is enabledpsql -c "SHOW helios.mongodb_port;"
# Check firewall rulessudo firewall-cmd --list-ports | grep 27017Problem: Authentication failed
pymongo.errors.OperationFailure: Authentication failedSolution:
# Ensure correct auth mechanismclient = MongoClient( "mongodb://user:password@heliosdb-server:27017/", authMechanism="SCRAM-SHA-256", # Ensure this matches server config authSource="admin")10.2 Query Compatibility Issues
Problem: Unsupported operator
pymongo.errors.OperationFailure: Unknown operator: $customOpSolution:
# Check compatibility matrix, use alternative# Instead of custom operators, use $expr with aggregation expressions
# Instead ofdb.collection.find({"$customOp": value})
# Usedb.collection.find({"$expr": {"$eq": ["$field", value]}})Problem: Query returns different results
Solution:
# Check for type mismatches# MongoDB may have stored numbers as strings
# Debug: Check actual stored typedoc = collection.find_one()print(type(doc.get('amount'))) # Check if string vs number
# Fix: Ensure consistent types during migrationif isinstance(doc['amount'], str): collection.update_one( {"_id": doc["_id"]}, {"$set": {"amount": float(doc['amount'])}} )10.3 Performance Issues
Problem: Slow queries after migration
Solution:
-- Check if indexes existSELECT indexname, indexdefFROM pg_indexesWHERE tablename = 'users';
-- Analyze table statisticsANALYZE users;
-- Check query planEXPLAIN ANALYZE SELECT * FROM users WHERE data->>'status' = 'active';
-- Create missing indexesCREATE INDEX CONCURRENTLY idx_users_status ON users ((data->>'status'));Problem: High memory usage
Solution:
-- Check for large documentsSELECT pg_size_pretty(pg_total_relation_size('users')) AS total_size;
-- Find large documentsSELECT _id, pg_column_size(data) AS sizeFROM usersORDER BY size DESCLIMIT 10;
-- Consider splitting large documents or using hybrid schema10.4 Data Type Issues
Problem: ObjectId conversion errors
Solution:
from bson import ObjectIdimport uuid
def objectid_to_uuid(oid): """Convert MongoDB ObjectId to UUID""" hex_str = str(oid) # Pad to 32 hex characters padded = hex_str.ljust(32, '0') return uuid.UUID(padded[:8] + '-' + padded[8:12] + '-' + padded[12:16] + '-' + padded[16:20] + '-' + padded[20:32])Problem: Date/time precision loss
Solution:
# MongoDB stores dates as milliseconds since epoch# Ensure proper conversion
from datetime import datetime
def convert_date(mongo_date): """Preserve date precision during migration""" if isinstance(mongo_date, datetime): return mongo_date.isoformat() return mongo_date10.5 Change Stream Issues
Problem: Change stream not receiving events
Solution:
# Ensure collection has proper CDC enabled# Check via SQL# SELECT * FROM pg_publication_tables WHERE pubname = 'mongodb_cdc';
# Restart change stream with explicit optionschange_stream = collection.watch( full_document='updateLookup', # Include full document on updates max_await_time_ms=1000)10.6 Transaction Issues
Problem: Transaction aborted unexpectedly
Solution:
# Increase transaction timeoutwith client.start_session() as session: with session.start_transaction( max_commit_time_ms=30000, # 30 second timeout read_concern=ReadConcern("snapshot"), write_concern=WriteConcern("majority") ): # Transaction operations pass10.7 Debugging Tools
# Enable MongoDB driver debuggingimport logginglogging.basicConfig(level=logging.DEBUG)logger = logging.getLogger('pymongo')logger.setLevel(logging.DEBUG)-- Enable HeliosDB query loggingALTER SYSTEM SET log_statement = 'all';ALTER SYSTEM SET log_min_duration_statement = 100; -- Log queries > 100msSELECT pg_reload_conf();
-- View recent logsSELECT * FROM pg_stat_statements ORDER BY total_time DESC LIMIT 20;Appendix A: Quick Reference Card
Connection String Mapping
| MongoDB | HeliosDB |
|---|---|
mongodb://host:27017/ | mongodb://host:27017/ |
mongodb+srv://cluster/ | mongodb://heliosdb:27017/ |
authSource=admin | authSource=admin |
replicaSet=rs0 | Not needed (built-in HA) |
Common Query Translation
| MongoDB | HeliosDB SQL |
|---|---|
find({}) | SELECT * FROM collection |
find({a: 1}) | WHERE data->>'a' = '1' |
find({a: {$gt: 1}}) | WHERE (data->>'a')::int > 1 |
find({a: {$in: [1,2]}}) | WHERE data->>'a' IN ('1','2') |
updateOne({}, {$set: {a: 1}}) | UPDATE SET data = jsonb_set(data, '{a}', '1') |
aggregate([{$group: ...}]) | SELECT ... GROUP BY ... |
Useful HeliosDB JSON Functions
| Function | Purpose |
|---|---|
data->>'field' | Extract field as text |
data->'field' | Extract field as JSONB |
data @> '{}' | JSONB containment |
data ? 'key' | Key exists |
jsonb_set() | Set nested value |
jsonb_array_elements() | Unwind array |
Appendix B: Migration Checklist Summary
Pre-Migration
- Database inventory completed
- Data volume estimated
- GridFS usage identified and alternative planned
- Backup created and verified
- Test environment prepared
Migration
- Schema designed (pure document, hybrid, or relational)
- Data exported from MongoDB
- Data imported to HeliosDB
- Indexes recreated
- Connection strings updated
Post-Migration
- Document counts verified
- Sample data validated
- All queries tested
- Performance benchmarked
- Application tests passed
- Change streams working
- Monitoring configured
Related Documentation
- MongoDB Protocol README - Protocol overview
- MongoDB Compatibility Matrix - Feature support details
- MongoDB Examples - Code examples
- General Migration Guide - Cross-protocol migration
- MySQL Migration Guide - SQL migration reference
Last Updated: 2026-01-04 Migration Guide Version: 1.0 HeliosDB Compatibility: 7.0+