Skip to content

MongoDB to HeliosDB Migration Guide

MongoDB to HeliosDB Migration Guide

Version: 1.0 Last Updated: 2026-01-04


Table of Contents

  1. Introduction
  2. Compatibility Overview
  3. Pre-Migration Checklist
  4. Conceptual Mapping
  5. Step-by-Step Migration Process
  6. Query Translation Examples
  7. Feature Mapping
  8. Post-Migration Validation
  9. Performance Considerations
  10. Common Issues and Troubleshooting

1. Introduction

1.1 Why Migrate from MongoDB to HeliosDB?

HeliosDB offers significant advantages over MongoDB while maintaining wire protocol compatibility:

AspectMongoDBHeliosDB
Data ModelDocument-onlyUnified multi-model (Document, Relational, Graph, Vector, Time-Series)
Query LanguageMongoDB Query Language onlyMQL, SQL, GraphQL, CQL, Redis commands
TransactionsSnapshot isolationSERIALIZABLE isolation with zero-maintenance MVCC
Joins$lookup onlyNative SQL JOINs + $lookup
AnalyticsLimited aggregationFull OLAP + OLTP (HTAP)
Full-Text SearchExternal (Atlas Search)Built-in with GraphRAG integration
Change StreamsStandardCDC-backed with lower latency
Vector SearchAtlas Vector SearchNative HNSW with multi-modal embeddings
LicensingSSPLEnterprise-friendly
ShardingManual configurationTransparent horizontal scaling

1.2 Migration Benefits

  • Unified Platform: Consolidate multiple database systems into one
  • SQL Access: Query MongoDB-style documents with standard SQL
  • Enhanced Analytics: Run complex analytical queries on document data
  • Better Transactions: ACID guarantees with serializable isolation
  • Cost Reduction: Single database license, reduced operational overhead
  • GraphRAG Integration: AI-powered semantic search on document content

1.3 Wire Protocol Compatibility

HeliosDB implements the MongoDB wire protocol (OP_MSG), allowing existing MongoDB applications to connect without code changes:

# Existing MongoDB code works unchanged
from pymongo import MongoClient
# Simply change the hostname
client = MongoClient("mongodb://heliosdb-server:27017/")
db = client.my_database
collection = db.my_collection
# All operations work as expected
result = collection.find({"status": "active"})

2. Compatibility Overview

2.1 Wire Protocol Support

Protocol FeatureStatusNotes
OP_MSG100%Modern protocol (MongoDB 3.6+)
OP_QUERY100%Legacy support
OP_INSERT/UPDATE/DELETE100%Legacy operations
BSON Encoding100%All 19 BSON types
SCRAM-SHA-256100%Recommended authentication
TLS/SSL100%Encrypted connections

2.2 CRUD Operations

OperationCompatibilityNotes
insertOne/insertMany100%Full support
find/findOne100%All query operators
updateOne/updateMany100%All update operators
deleteOne/deleteMany100%Full support
replaceOne100%Document replacement
bulkWrite100%Ordered and unordered

2.3 Query Operators

CategoryCoverageExamples
Comparison100%$eq, $gt, $lt, $in, $nin
Logical100%$and, $or, $not, $nor
Element100%$exists, $type
Evaluation100%$regex, $text, $expr
Array100%$all, $elemMatch, $size
Geospatial100%$near, $geoWithin
Bitwise100%$bitsAllSet, $bitsAnyClear

2.4 Aggregation Pipeline

StageStatusNotes
$matchSupportedFilter documents
$projectSupportedReshape documents
$groupSupportedGroup with accumulators
$sortSupportedMulti-field sorting
$lookupSupportedLeft outer join
$graphLookupSupportedRecursive lookup
$unwindSupportedDeconstruct arrays
$facetSupportedMulti-faceted results
$setWindowFieldsSupportedWindow functions
$mergeSupportedWrite to collection

2.5 Known Limitations

FeatureStatusAlternative
GridFSNot supportedUse blob storage or S3 integration
Capped collectionsNot supportedUse TTL indexes
Server-side JavaScriptNot supportedUse SQL procedures
Map-reduceDeprecatedUse aggregation pipeline

3. Pre-Migration Checklist

3.1 Assessment Phase

  • Inventory databases and collections

    Terminal window
    # List all databases
    mongosh --eval "db.adminCommand('listDatabases')"
    # List collections in each database
    mongosh mydb --eval "db.getCollectionNames()"
  • Document data volume

    // Get collection statistics
    db.collection.stats()
    // Count documents
    db.collection.countDocuments({})
  • Identify GridFS usage

    // Check for GridFS collections
    db.getCollectionNames().filter(n => n.endsWith('.files') || n.endsWith('.chunks'))
  • Review index definitions

    // List all indexes
    db.collection.getIndexes()
  • Analyze query patterns

    // Enable profiler
    db.setProfilingLevel(1, { slowms: 100 })
    // Review slow queries
    db.system.profile.find().sort({ ts: -1 }).limit(20)

3.2 Technical Requirements

  • HeliosDB version: 7.0 or later recommended
  • Storage capacity: 1.5x current MongoDB storage (for import headroom)
  • Network connectivity: MongoDB port 27017 accessible
  • Authentication: Prepare HeliosDB credentials
  • Backup verified: Recent MongoDB backup tested for restore

3.3 Application Assessment

  • Driver versions: Document current MongoDB driver versions
  • Connection strings: Inventory all application connection strings
  • Custom code: Review any server-side JavaScript or map-reduce usage
  • Change stream consumers: Document change stream applications
  • Test environment: Prepare isolated test environment

3.4 Risk Assessment

RiskMitigation
Data lossComprehensive backup before migration
Application downtimePlan maintenance window or dual-write strategy
Query incompatibilityPre-test all application queries
Performance regressionBenchmark before and after

4. Conceptual Mapping

4.1 Collections to Tables

MongoDB collections map to HeliosDB tables with JSONB storage:

MongoDB Collection:

// MongoDB: users collection
{
"_id": ObjectId("507f1f77bcf86cd799439011"),
"username": "johndoe",
"email": "john@example.com",
"profile": {
"firstName": "John",
"lastName": "Doe",
"age": 30
},
"tags": ["developer", "mongodb"]
}

HeliosDB Table (Internal Representation):

-- Automatic table structure created by MongoDB protocol
CREATE TABLE users (
_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
data JSONB NOT NULL
);
-- The document is stored in the 'data' JSONB column
-- _id is extracted and indexed for fast lookups

Hybrid Schema (Optional Optimization):

-- For frequently queried fields, create hybrid schema
CREATE TABLE users (
_id UUID PRIMARY KEY,
username VARCHAR(100),
email VARCHAR(255),
profile JSONB,
tags TEXT[],
created_at TIMESTAMP DEFAULT NOW()
);
-- Create indexes on extracted fields
CREATE INDEX idx_users_username ON users(username);
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_users_tags ON users USING GIN(tags);

4.2 Documents to Rows with JSONB

MongoDB ConceptHeliosDB Equivalent
DocumentRow with JSONB column
Embedded documentNested JSONB object
ArrayJSONB array or PostgreSQL array
ObjectIdUUID (auto-converted)
ISODateTIMESTAMP/TIMESTAMPTZ
NumberLongBIGINT
NumberDecimalNUMERIC
BinDataBYTEA

Type Conversion Examples:

// MongoDB document with various types
{
"_id": ObjectId("507f1f77bcf86cd799439011"),
"price": NumberDecimal("19.99"),
"quantity": NumberLong(100),
"created": ISODate("2025-01-01T00:00:00Z"),
"metadata": BinData(0, "base64data=="),
"location": {
"type": "Point",
"coordinates": [-73.97, 40.77]
}
}
-- HeliosDB JSONB representation
{
"_id": "507f1f77-bcf8-6cd7-9943-9011507f1f77",
"price": 19.99,
"quantity": 100,
"created": "2025-01-01T00:00:00Z",
"metadata": "\\x626173653634646174613d3d",
"location": {
"type": "Point",
"coordinates": [-73.97, 40.77]
}
}

4.3 Indexes

MongoDB IndexHeliosDB Equivalent
Single fieldB-tree index on JSONB path
CompoundComposite B-tree index
UniqueUnique constraint/index
TTLTTL extension on JSONB
TextGIN index with full-text search
2dsphereGiST geospatial index
HashedHash index
WildcardGIN index on JSONB

Index Migration Examples:

// MongoDB indexes
db.users.createIndex({ "email": 1 }, { unique: true })
db.users.createIndex({ "profile.age": 1, "status": 1 })
db.users.createIndex({ "location": "2dsphere" })
db.users.createIndex({ "content": "text", "title": "text" })
db.users.createIndex({ "createdAt": 1 }, { expireAfterSeconds: 3600 })
-- HeliosDB equivalent indexes
CREATE UNIQUE INDEX idx_users_email ON users ((data->>'email'));
CREATE INDEX idx_users_age_status ON users ((data->'profile'->>'age'), (data->>'status'));
CREATE INDEX idx_users_location ON users USING GIST ((data->'location'));
CREATE INDEX idx_users_content ON users USING GIN (to_tsvector('english',
COALESCE(data->>'content', '') || ' ' || COALESCE(data->>'title', '')));
-- TTL implemented via background job or HeliosDB TTL extension

4.4 Database and Collection Namespaces

MongoDBHeliosDB
DatabaseSchema
CollectionTable
mydb.usersmydb.users (schema.table)
adminheliosdb_admin (system schema)

5. Step-by-Step Migration Process

5.1 Schema Design in HeliosDB

Option A: Pure Document Mode (Simplest)

-- HeliosDB automatically creates this structure
-- when accessed via MongoDB protocol
CREATE SCHEMA IF NOT EXISTS mydb;
CREATE TABLE mydb.users (
_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
data JSONB NOT NULL
);
CREATE INDEX idx_users_data ON mydb.users USING GIN (data jsonb_path_ops);
-- Design schema with frequently queried fields extracted
CREATE TABLE mydb.users (
_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
username VARCHAR(100) NOT NULL,
email VARCHAR(255) UNIQUE NOT NULL,
status VARCHAR(20) DEFAULT 'active',
profile JSONB,
tags TEXT[],
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
-- Indexes for common query patterns
CREATE INDEX idx_users_status ON mydb.users(status);
CREATE INDEX idx_users_created ON mydb.users(created_at DESC);
CREATE INDEX idx_users_profile ON mydb.users USING GIN (profile);
CREATE INDEX idx_users_tags ON mydb.users USING GIN (tags);

Option C: Full Relational Conversion (Best for Analytics)

-- Normalize embedded documents into separate tables
CREATE TABLE mydb.users (
_id UUID PRIMARY KEY,
username VARCHAR(100) NOT NULL,
email VARCHAR(255) UNIQUE NOT NULL,
status VARCHAR(20) DEFAULT 'active',
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE TABLE mydb.user_profiles (
user_id UUID PRIMARY KEY REFERENCES mydb.users(_id),
first_name VARCHAR(100),
last_name VARCHAR(100),
age INTEGER,
bio TEXT
);
CREATE TABLE mydb.user_tags (
id SERIAL PRIMARY KEY,
user_id UUID REFERENCES mydb.users(_id),
tag VARCHAR(50)
);
CREATE INDEX idx_user_tags_user ON mydb.user_tags(user_id);
CREATE INDEX idx_user_tags_tag ON mydb.user_tags(tag);

5.2 Data Export Using mongodump/mongoexport

Terminal window
# Export entire database
mongodump \
--uri="mongodb://source-mongodb:27017/mydb" \
--out=/backup/mongodb \
--gzip
# Export specific collection
mongodump \
--uri="mongodb://source-mongodb:27017/mydb" \
--collection=users \
--out=/backup/mongodb \
--gzip
# Export with query filter (incremental)
mongodump \
--uri="mongodb://source-mongodb:27017/mydb" \
--collection=orders \
--query='{"createdAt": {"$gte": {"$date": "2025-01-01T00:00:00Z"}}}' \
--out=/backup/mongodb

Method 2: mongoexport (JSON/CSV, Good for Transformation)

Terminal window
# Export to JSON (extended JSON format)
mongoexport \
--uri="mongodb://source-mongodb:27017/mydb" \
--collection=users \
--out=users.json \
--jsonArray
# Export to CSV (for relational import)
mongoexport \
--uri="mongodb://source-mongodb:27017/mydb" \
--collection=users \
--type=csv \
--fields=_id,username,email,profile.firstName,profile.lastName \
--out=users.csv
# Export with query filter
mongoexport \
--uri="mongodb://source-mongodb:27017/mydb" \
--collection=orders \
--query='{"status": "completed"}' \
--out=completed_orders.json

5.3 Data Transformation and Import

Direct Import via mongorestore (Fastest)

Terminal window
# Import directly to HeliosDB (uses MongoDB wire protocol)
mongorestore \
--uri="mongodb://heliosdb-server:27017/mydb" \
--gzip \
/backup/mongodb/mydb
# Import specific collection
mongorestore \
--uri="mongodb://heliosdb-server:27017/mydb" \
--collection=users \
--gzip \
/backup/mongodb/mydb/users.bson.gz
# Import with options
mongorestore \
--uri="mongodb://heliosdb-server:27017/mydb" \
--drop \
--numInsertionWorkers=4 \
--batchSize=1000 \
/backup/mongodb/mydb

JSON Import via SQL COPY

Terminal window
# Convert MongoDB extended JSON to standard JSON
cat users.json | jq -c '.[] | {_id: ._id."$oid", data: .}' > users_transformed.json
# Import into HeliosDB using COPY
psql -h heliosdb-server -U admin -d mydb -c "
COPY users (data) FROM STDIN WITH (FORMAT json);
" < users_transformed.json

Python Migration Script (Complex Transformations)

#!/usr/bin/env python3
"""
MongoDB to HeliosDB Migration Script
Handles complex data transformations during migration
"""
from pymongo import MongoClient
from datetime import datetime
import uuid
import json
# Source MongoDB
source_client = MongoClient("mongodb://source-mongodb:27017/")
source_db = source_client.mydb
# Target HeliosDB (via MongoDB protocol)
target_client = MongoClient("mongodb://heliosdb-server:27017/")
target_db = target_client.mydb
def convert_objectid(oid):
"""Convert MongoDB ObjectId to UUID format"""
hex_str = str(oid)
# Pad to 32 characters for UUID
padded = hex_str.ljust(32, '0')
return uuid.UUID(padded[:8] + '-' + padded[8:12] + '-' +
padded[12:16] + '-' + padded[16:20] + '-' + padded[20:32])
def transform_document(doc):
"""Transform document for HeliosDB compatibility"""
transformed = {}
for key, value in doc.items():
if key == '_id':
# Convert ObjectId to UUID string
transformed['_id'] = str(convert_objectid(value))
elif isinstance(value, datetime):
# Convert datetime to ISO string
transformed[key] = value.isoformat()
elif isinstance(value, dict):
# Recursively transform nested documents
transformed[key] = transform_document(value)
elif isinstance(value, list):
# Transform array elements
transformed[key] = [
transform_document(v) if isinstance(v, dict) else v
for v in value
]
else:
transformed[key] = value
return transformed
def migrate_collection(collection_name, batch_size=1000):
"""Migrate a single collection with batching"""
source_collection = source_db[collection_name]
target_collection = target_db[collection_name]
total = source_collection.count_documents({})
migrated = 0
print(f"Migrating {collection_name}: {total} documents")
cursor = source_collection.find().batch_size(batch_size)
batch = []
for doc in cursor:
transformed = transform_document(doc)
batch.append(transformed)
if len(batch) >= batch_size:
target_collection.insert_many(batch, ordered=False)
migrated += len(batch)
print(f" Progress: {migrated}/{total} ({100*migrated/total:.1f}%)")
batch = []
# Insert remaining documents
if batch:
target_collection.insert_many(batch, ordered=False)
migrated += len(batch)
print(f" Completed: {migrated} documents migrated")
return migrated
def migrate_indexes(collection_name):
"""Recreate indexes in target collection"""
source_collection = source_db[collection_name]
target_collection = target_db[collection_name]
indexes = source_collection.index_information()
for name, info in indexes.items():
if name == '_id_':
continue # Skip default _id index
keys = info['key']
options = {k: v for k, v in info.items()
if k not in ('key', 'v', 'ns')}
try:
target_collection.create_index(keys, name=name, **options)
print(f" Created index: {name}")
except Exception as e:
print(f" Warning: Could not create index {name}: {e}")
def main():
collections = source_db.list_collection_names()
# Filter out system collections
collections = [c for c in collections
if not c.startswith('system.')]
print(f"Found {len(collections)} collections to migrate")
for collection_name in collections:
migrate_collection(collection_name)
migrate_indexes(collection_name)
print("\nMigration complete!")
if __name__ == "__main__":
main()

5.4 Application Connection Changes

Python (PyMongo)

# Before: MongoDB
from pymongo import MongoClient
client = MongoClient("mongodb://mongodb-server:27017/")
# After: HeliosDB (same code, different hostname)
from pymongo import MongoClient
client = MongoClient("mongodb://heliosdb-server:27017/")
# With authentication
client = MongoClient(
"mongodb://heliosdb-server:27017/",
username="myuser",
password="mypassword",
authSource="admin",
authMechanism="SCRAM-SHA-256"
)
# With connection pooling and options
client = MongoClient(
"mongodb://heliosdb-server:27017/",
maxPoolSize=50,
minPoolSize=10,
maxIdleTimeMS=30000,
serverSelectionTimeoutMS=5000
)

Node.js (mongodb driver)

// Before: MongoDB
const { MongoClient } = require('mongodb');
const client = new MongoClient('mongodb://mongodb-server:27017/');
// After: HeliosDB
const client = new MongoClient('mongodb://heliosdb-server:27017/', {
auth: {
username: 'myuser',
password: 'mypassword'
},
authSource: 'admin',
maxPoolSize: 50
});

Java (MongoDB Driver)

// Before: MongoDB
MongoClient client = MongoClients.create("mongodb://mongodb-server:27017/");
// After: HeliosDB
MongoClientSettings settings = MongoClientSettings.builder()
.applyConnectionString(new ConnectionString("mongodb://heliosdb-server:27017/"))
.credential(MongoCredential.createScramSha256Credential(
"myuser", "admin", "mypassword".toCharArray()))
.build();
MongoClient client = MongoClients.create(settings);

Environment-Based Configuration

import os
# Use environment variable for easy switching
MONGODB_URI = os.environ.get(
"MONGODB_URI",
"mongodb://heliosdb-server:27017/"
)
client = MongoClient(MONGODB_URI)

6. Query Translation Examples

6.1 find() to SELECT

Basic Find

// MongoDB
db.users.find({ status: "active" })
-- HeliosDB SQL equivalent
SELECT * FROM users WHERE data->>'status' = 'active';
-- Or with hybrid schema
SELECT * FROM users WHERE status = 'active';

Find with Projection

// MongoDB
db.users.find(
{ status: "active" },
{ username: 1, email: 1, _id: 0 }
)
-- HeliosDB SQL
SELECT
data->>'username' AS username,
data->>'email' AS email
FROM users
WHERE data->>'status' = 'active';

Find with Operators

// MongoDB: comparison operators
db.orders.find({
amount: { $gte: 100, $lte: 500 },
status: { $in: ["pending", "processing"] }
})
-- HeliosDB SQL
SELECT * FROM orders
WHERE (data->>'amount')::numeric BETWEEN 100 AND 500
AND data->>'status' IN ('pending', 'processing');

Find with Nested Fields

// MongoDB: nested document query
db.users.find({ "profile.age": { $gte: 21 } })
-- HeliosDB SQL
SELECT * FROM users
WHERE (data->'profile'->>'age')::integer >= 21;
-- Using jsonpath (more efficient)
SELECT * FROM users
WHERE data @@ '$.profile.age >= 21';

Find with Arrays

// MongoDB: array queries
db.users.find({ tags: "mongodb" }) // contains
db.users.find({ tags: { $all: ["mongodb", "developer"] } }) // contains all
db.users.find({ "items.price": { $gt: 100 } }) // array element match
-- HeliosDB SQL: array contains
SELECT * FROM users WHERE data->'tags' ? 'mongodb';
-- Contains all
SELECT * FROM users
WHERE data->'tags' ?& array['mongodb', 'developer'];
-- Array element match
SELECT * FROM users
WHERE EXISTS (
SELECT 1 FROM jsonb_array_elements(data->'items') AS item
WHERE (item->>'price')::numeric > 100
);

6.2 aggregate() to SQL with JSON Functions

Group and Count

// MongoDB aggregation
db.orders.aggregate([
{ $match: { status: "completed" } },
{ $group: {
_id: "$category",
total: { $sum: "$amount" },
count: { $sum: 1 },
avgAmount: { $avg: "$amount" }
}},
{ $sort: { total: -1 } },
{ $limit: 10 }
])
-- HeliosDB SQL
SELECT
data->>'category' AS category,
SUM((data->>'amount')::numeric) AS total,
COUNT(*) AS count,
AVG((data->>'amount')::numeric) AS avg_amount
FROM orders
WHERE data->>'status' = 'completed'
GROUP BY data->>'category'
ORDER BY total DESC
LIMIT 10;

Lookup (Join)

// MongoDB $lookup
db.orders.aggregate([
{ $lookup: {
from: "customers",
localField: "customer_id",
foreignField: "_id",
as: "customer"
}},
{ $unwind: "$customer" }
])
-- HeliosDB SQL (much simpler with native JOINs)
SELECT
o.*,
c.data AS customer
FROM orders o
JOIN customers c ON o.data->>'customer_id' = c._id::text;
-- Or with JSONB aggregation to mimic $unwind result
SELECT
o.data || jsonb_build_object('customer', c.data) AS document
FROM orders o
JOIN customers c ON o.data->>'customer_id' = c._id::text;
// MongoDB $facet
db.products.aggregate([
{ $facet: {
byCategory: [
{ $group: { _id: "$category", count: { $sum: 1 } } }
],
priceRange: [
{ $bucket: {
groupBy: "$price",
boundaries: [0, 50, 100, 500],
default: "Other"
}}
],
totalCount: [
{ $count: "count" }
]
}}
])
-- HeliosDB SQL: Use separate queries or WITH clause
WITH
by_category AS (
SELECT data->>'category' AS category, COUNT(*) AS count
FROM products
GROUP BY data->>'category'
),
price_range AS (
SELECT
CASE
WHEN (data->>'price')::numeric < 50 THEN '0-50'
WHEN (data->>'price')::numeric < 100 THEN '50-100'
WHEN (data->>'price')::numeric < 500 THEN '100-500'
ELSE 'Other'
END AS range,
COUNT(*) AS count
FROM products
GROUP BY 1
),
total AS (
SELECT COUNT(*) AS count FROM products
)
SELECT
jsonb_build_object(
'byCategory', (SELECT jsonb_agg(row_to_json(by_category)) FROM by_category),
'priceRange', (SELECT jsonb_agg(row_to_json(price_range)) FROM price_range),
'totalCount', (SELECT count FROM total)
) AS facets;

6.3 update() to UPDATE with JSONB Operators

Basic Update

// MongoDB
db.users.updateOne(
{ _id: ObjectId("...") },
{ $set: { status: "active" } }
)
-- HeliosDB SQL
UPDATE users
SET data = jsonb_set(data, '{status}', '"active"')
WHERE _id = '...'::uuid;
-- With hybrid schema
UPDATE users SET status = 'active' WHERE _id = '...'::uuid;

Increment

// MongoDB
db.products.updateMany(
{ category: "electronics" },
{ $inc: { viewCount: 1, stock: -1 } }
)
-- HeliosDB SQL
UPDATE products
SET data = data || jsonb_build_object(
'viewCount', COALESCE((data->>'viewCount')::integer, 0) + 1,
'stock', COALESCE((data->>'stock')::integer, 0) - 1
)
WHERE data->>'category' = 'electronics';

Array Push

// MongoDB
db.users.updateOne(
{ _id: ObjectId("...") },
{ $push: { tags: "premium" } }
)
-- HeliosDB SQL
UPDATE users
SET data = jsonb_set(
data,
'{tags}',
COALESCE(data->'tags', '[]'::jsonb) || '"premium"'::jsonb
)
WHERE _id = '...'::uuid;

Array Pull

// MongoDB
db.users.updateOne(
{ _id: ObjectId("...") },
{ $pull: { tags: "temporary" } }
)
-- HeliosDB SQL
UPDATE users
SET data = jsonb_set(
data,
'{tags}',
(SELECT jsonb_agg(elem)
FROM jsonb_array_elements(data->'tags') AS elem
WHERE elem != '"temporary"')
)
WHERE _id = '...'::uuid;

Upsert

// MongoDB
db.users.updateOne(
{ email: "new@example.com" },
{ $set: { name: "New User", createdAt: new Date() } },
{ upsert: true }
)
-- HeliosDB SQL
INSERT INTO users (data)
VALUES ('{"email": "new@example.com", "name": "New User", "createdAt": "..."}'::jsonb)
ON CONFLICT ((data->>'email'))
DO UPDATE SET data = users.data || EXCLUDED.data;

7. Feature Mapping

7.1 Aggregation Pipeline Equivalents

MongoDB StageHeliosDB SQL Equivalent
$matchWHERE clause
$projectSELECT with JSON functions
$groupGROUP BY with aggregates
$sortORDER BY
$limitLIMIT
$skipOFFSET
$unwindLATERAL jsonb_array_elements()
$lookupJOIN
$graphLookupWITH RECURSIVE
$bucketCASE WHEN in GROUP BY
$facetMultiple CTEs
$setWindowFieldsWindow functions

7.2 Change Streams vs HeliosDB CDC

FeatureMongoDB Change StreamsHeliosDB CDC
ProtocolNative MongoDBNative + Debezium-compatible
Latency~100ms<10ms
ResumeToken-basedLSN-based (more reliable)
FilteringPipeline stagesSQL predicates
OutputBSONJSON, Avro, Protobuf
DestinationsApplication codeKafka, Webhooks, S3, etc.

MongoDB Change Stream:

const pipeline = [
{ $match: { operationType: "insert" } }
];
const changeStream = collection.watch(pipeline);
changeStream.on("change", (change) => {
console.log(change);
});

HeliosDB CDC (via MongoDB protocol - identical API):

// Same API works with HeliosDB
const changeStream = collection.watch(pipeline);
changeStream.on("change", (change) => {
console.log(change);
});

HeliosDB CDC (direct SQL subscription):

-- Create CDC subscription
CREATE PUBLICATION users_changes FOR TABLE users;
-- Subscribe via logical replication
CREATE SUBSCRIPTION users_sub
CONNECTION 'host=heliosdb-server dbname=mydb'
PUBLICATION users_changes;

HeliosDB Webhook CDC:

-- Configure webhook notification
SELECT helios_cdc.create_webhook(
'users',
'https://myapp.com/webhooks/user-changes',
'{"events": ["INSERT", "UPDATE", "DELETE"]}'
);

7.3 Geospatial Queries

MongoDBHeliosDB
2dsphere indexGiST index with geography
$nearST_DWithin + ORDER BY ST_Distance
$geoWithinST_Within
$geoIntersectsST_Intersects
GeoJSONNative GeoJSON support

MongoDB Geospatial:

// Create 2dsphere index
db.places.createIndex({ location: "2dsphere" })
// Find near point
db.places.find({
location: {
$near: {
$geometry: { type: "Point", coordinates: [-73.97, 40.77] },
$maxDistance: 1000
}
}
})
// Find within polygon
db.places.find({
location: {
$geoWithin: {
$geometry: {
type: "Polygon",
coordinates: [[[-73.98, 40.76], [-73.96, 40.76], [-73.96, 40.78], [-73.98, 40.78], [-73.98, 40.76]]]
}
}
}
})

HeliosDB SQL Geospatial:

-- Create spatial index
CREATE INDEX idx_places_location ON places
USING GIST (ST_GeomFromGeoJSON(data->>'location'));
-- Find near point (within 1000 meters)
SELECT * FROM places
WHERE ST_DWithin(
ST_GeomFromGeoJSON(data->>'location')::geography,
ST_SetSRID(ST_MakePoint(-73.97, 40.77), 4326)::geography,
1000
)
ORDER BY ST_Distance(
ST_GeomFromGeoJSON(data->>'location')::geography,
ST_SetSRID(ST_MakePoint(-73.97, 40.77), 4326)::geography
);
-- Find within polygon
SELECT * FROM places
WHERE ST_Within(
ST_GeomFromGeoJSON(data->>'location'),
ST_GeomFromGeoJSON('{
"type": "Polygon",
"coordinates": [[[-73.98, 40.76], [-73.96, 40.76], [-73.96, 40.78], [-73.98, 40.78], [-73.98, 40.76]]]
}')
);
MongoDBHeliosDB
Text indexGIN index with tsvector
$text searchto_tsvector/to_tsquery
Score sortingts_rank
Language support30+ languages
StemmingBuilt-in

MongoDB Text Search:

// Create text index
db.articles.createIndex({ title: "text", content: "text" })
// Search
db.articles.find(
{ $text: { $search: "mongodb database" } },
{ score: { $meta: "textScore" } }
).sort({ score: { $meta: "textScore" } })

HeliosDB Full-Text Search:

-- Create full-text index
CREATE INDEX idx_articles_fts ON articles
USING GIN (to_tsvector('english',
COALESCE(data->>'title', '') || ' ' || COALESCE(data->>'content', '')));
-- Search with ranking
SELECT
*,
ts_rank(
to_tsvector('english', COALESCE(data->>'title', '') || ' ' || COALESCE(data->>'content', '')),
plainto_tsquery('english', 'mongodb database')
) AS score
FROM articles
WHERE to_tsvector('english', COALESCE(data->>'title', '') || ' ' || COALESCE(data->>'content', ''))
@@ plainto_tsquery('english', 'mongodb database')
ORDER BY score DESC;

HeliosDB GraphRAG Semantic Search:

-- Enable semantic search (AI-powered)
SELECT helios_graphrag.enable_semantic_search('articles', 'content');
-- Semantic search
SELECT * FROM helios_graphrag.semantic_search(
'articles',
'How do document databases handle transactions?',
10 -- top 10 results
);

8. Post-Migration Validation

8.1 Data Integrity Checks

// Compare document counts
// Source MongoDB
const sourceCount = db.users.countDocuments({});
console.log(`Source count: ${sourceCount}`);
// Target HeliosDB
const targetCount = db.users.countDocuments({});
console.log(`Target count: ${targetCount}`);
// Verify counts match
if (sourceCount !== targetCount) {
console.error(`Count mismatch! Source: ${sourceCount}, Target: ${targetCount}`);
}
def validate_migration(source_db, target_db, collection_name, sample_size=1000):
"""Validate migrated data integrity"""
source = source_db[collection_name]
target = target_db[collection_name]
# Count validation
source_count = source.count_documents({})
target_count = target.count_documents({})
if source_count != target_count:
print(f"ERROR: Count mismatch - Source: {source_count}, Target: {target_count}")
return False
# Sample validation
sample = list(source.aggregate([{"$sample": {"size": sample_size}}]))
for doc in sample:
target_doc = target.find_one({"_id": doc["_id"]})
if target_doc is None:
print(f"ERROR: Document {doc['_id']} not found in target")
return False
# Compare key fields (adjust based on your schema)
for key in ["username", "email", "status"]:
if doc.get(key) != target_doc.get(key):
print(f"ERROR: Field {key} mismatch for {doc['_id']}")
return False
print(f"Validation passed: {source_count} documents verified")
return True

8.2 Query Validation

def validate_queries(source_db, target_db, collection_name):
"""Run test queries on both databases and compare results"""
test_queries = [
# Basic find
{"status": "active"},
# Range query
{"created_at": {"$gte": datetime(2025, 1, 1)}},
# Nested field
{"profile.age": {"$gte": 21}},
# Array query
{"tags": {"$in": ["premium", "vip"]}},
]
source = source_db[collection_name]
target = target_db[collection_name]
for query in test_queries:
source_results = list(source.find(query).limit(100))
target_results = list(target.find(query).limit(100))
if len(source_results) != len(target_results):
print(f"Query result count mismatch for {query}")
print(f" Source: {len(source_results)}, Target: {len(target_results)}")
else:
print(f"Query {query}: OK ({len(source_results)} results)")

8.3 Index Validation

// Compare indexes
function compareIndexes(sourceDb, targetDb, collectionName) {
const sourceIndexes = sourceDb[collectionName].getIndexes();
const targetIndexes = targetDb[collectionName].getIndexes();
const sourceNames = sourceIndexes.map(i => i.name).sort();
const targetNames = targetIndexes.map(i => i.name).sort();
console.log("Source indexes:", sourceNames);
console.log("Target indexes:", targetNames);
// Check for missing indexes
for (const name of sourceNames) {
if (!targetNames.includes(name)) {
console.log(`WARNING: Index ${name} missing in target`);
}
}
}

8.4 Performance Validation

import time
def benchmark_queries(db, collection_name, iterations=100):
"""Benchmark common query patterns"""
collection = db[collection_name]
benchmarks = {
"find_by_id": lambda: collection.find_one({"_id": sample_id}),
"find_by_index": lambda: list(collection.find({"status": "active"}).limit(10)),
"aggregation": lambda: list(collection.aggregate([
{"$match": {"status": "active"}},
{"$group": {"_id": "$category", "count": {"$sum": 1}}}
])),
"text_search": lambda: list(collection.find(
{"$text": {"$search": "search terms"}}
).limit(10)),
}
results = {}
for name, query_func in benchmarks.items():
times = []
for _ in range(iterations):
start = time.time()
query_func()
times.append(time.time() - start)
results[name] = {
"avg_ms": sum(times) / len(times) * 1000,
"min_ms": min(times) * 1000,
"max_ms": max(times) * 1000
}
print(f"{name}: avg={results[name]['avg_ms']:.2f}ms")
return results

8.5 Application Testing Checklist

  • Unit tests pass against HeliosDB
  • Integration tests pass with new connection strings
  • CRUD operations work correctly
  • Aggregation pipelines return expected results
  • Change streams receive events
  • Transactions commit and rollback properly
  • Authentication works with all user accounts
  • Connection pooling handles load correctly
  • Error handling catches expected exceptions

9. Performance Considerations

9.1 Index Optimization

-- Analyze query patterns
EXPLAIN ANALYZE SELECT * FROM users WHERE data->>'status' = 'active';
-- Create expression indexes for frequent queries
CREATE INDEX idx_users_status ON users ((data->>'status'));
CREATE INDEX idx_users_email ON users ((data->>'email'));
-- Create GIN index for flexible JSON queries
CREATE INDEX idx_users_data ON users USING GIN (data jsonb_path_ops);
-- For range queries on numeric fields
CREATE INDEX idx_orders_amount ON orders (((data->>'amount')::numeric));
-- Composite index for multi-field queries
CREATE INDEX idx_orders_status_date ON orders (
(data->>'status'),
((data->>'created_at')::timestamptz)
);

9.2 Query Optimization

-- Use JSONB containment for exact matches (uses GIN index)
SELECT * FROM users WHERE data @> '{"status": "active"}';
-- Use jsonpath for complex queries
SELECT * FROM users
WHERE data @@ '$.profile.age > 21 && $.tags[*] == "premium"';
-- Avoid full table scans with proper indexes
-- BAD: No index usage
SELECT * FROM users WHERE data->>'name' ILIKE '%john%';
-- GOOD: Use full-text search
SELECT * FROM users
WHERE to_tsvector('english', data->>'name') @@ to_tsquery('john');

9.3 Bulk Operations

# Batch inserts for better performance
def bulk_insert(collection, documents, batch_size=1000):
"""Insert documents in batches for optimal performance"""
for i in range(0, len(documents), batch_size):
batch = documents[i:i + batch_size]
collection.insert_many(batch, ordered=False)
print(f"Inserted batch {i//batch_size + 1}")
-- Use COPY for large data imports (SQL mode)
COPY users (data) FROM '/path/to/data.json' WITH (FORMAT json);
-- Disable indexes during bulk load, rebuild after
ALTER INDEX idx_users_data DISABLE;
-- ... bulk insert ...
REINDEX INDEX idx_users_data;

9.4 Connection Pooling

# Configure connection pooling
from pymongo import MongoClient
client = MongoClient(
"mongodb://heliosdb-server:27017/",
maxPoolSize=100, # Maximum connections
minPoolSize=10, # Minimum connections to maintain
maxIdleTimeMS=30000, # Close idle connections after 30s
waitQueueTimeoutMS=5000, # Timeout waiting for connection
socketTimeoutMS=30000, # Socket operation timeout
serverSelectionTimeoutMS=5000 # Server selection timeout
)

9.5 Memory and Cache Configuration

-- Configure HeliosDB for document workloads
ALTER SYSTEM SET work_mem = '256MB'; -- For complex aggregations
ALTER SYSTEM SET shared_buffers = '8GB'; -- For caching
ALTER SYSTEM SET effective_cache_size = '24GB'; -- Query planner hint
-- Enable intelligent caching for hot collections
ALTER TABLE users ENABLE INTELLIGENT_CACHING;

10. Common Issues and Troubleshooting

10.1 Connection Issues

Problem: Connection refused

pymongo.errors.ServerSelectionTimeoutError: heliosdb-server:27017: [Errno 111] Connection refused

Solution:

Terminal window
# Check if HeliosDB is listening on MongoDB port
netstat -tlnp | grep 27017
# Verify MongoDB protocol is enabled
psql -c "SHOW helios.mongodb_port;"
# Check firewall rules
sudo firewall-cmd --list-ports | grep 27017

Problem: Authentication failed

pymongo.errors.OperationFailure: Authentication failed

Solution:

# Ensure correct auth mechanism
client = MongoClient(
"mongodb://user:password@heliosdb-server:27017/",
authMechanism="SCRAM-SHA-256", # Ensure this matches server config
authSource="admin"
)

10.2 Query Compatibility Issues

Problem: Unsupported operator

pymongo.errors.OperationFailure: Unknown operator: $customOp

Solution:

# Check compatibility matrix, use alternative
# Instead of custom operators, use $expr with aggregation expressions
# Instead of
db.collection.find({"$customOp": value})
# Use
db.collection.find({"$expr": {"$eq": ["$field", value]}})

Problem: Query returns different results

Solution:

# Check for type mismatches
# MongoDB may have stored numbers as strings
# Debug: Check actual stored type
doc = collection.find_one()
print(type(doc.get('amount'))) # Check if string vs number
# Fix: Ensure consistent types during migration
if isinstance(doc['amount'], str):
collection.update_one(
{"_id": doc["_id"]},
{"$set": {"amount": float(doc['amount'])}}
)

10.3 Performance Issues

Problem: Slow queries after migration

Solution:

-- Check if indexes exist
SELECT indexname, indexdef
FROM pg_indexes
WHERE tablename = 'users';
-- Analyze table statistics
ANALYZE users;
-- Check query plan
EXPLAIN ANALYZE SELECT * FROM users WHERE data->>'status' = 'active';
-- Create missing indexes
CREATE INDEX CONCURRENTLY idx_users_status ON users ((data->>'status'));

Problem: High memory usage

Solution:

-- Check for large documents
SELECT pg_size_pretty(pg_total_relation_size('users')) AS total_size;
-- Find large documents
SELECT _id, pg_column_size(data) AS size
FROM users
ORDER BY size DESC
LIMIT 10;
-- Consider splitting large documents or using hybrid schema

10.4 Data Type Issues

Problem: ObjectId conversion errors

Solution:

from bson import ObjectId
import uuid
def objectid_to_uuid(oid):
"""Convert MongoDB ObjectId to UUID"""
hex_str = str(oid)
# Pad to 32 hex characters
padded = hex_str.ljust(32, '0')
return uuid.UUID(padded[:8] + '-' + padded[8:12] + '-' +
padded[12:16] + '-' + padded[16:20] + '-' + padded[20:32])

Problem: Date/time precision loss

Solution:

# MongoDB stores dates as milliseconds since epoch
# Ensure proper conversion
from datetime import datetime
def convert_date(mongo_date):
"""Preserve date precision during migration"""
if isinstance(mongo_date, datetime):
return mongo_date.isoformat()
return mongo_date

10.5 Change Stream Issues

Problem: Change stream not receiving events

Solution:

# Ensure collection has proper CDC enabled
# Check via SQL
# SELECT * FROM pg_publication_tables WHERE pubname = 'mongodb_cdc';
# Restart change stream with explicit options
change_stream = collection.watch(
full_document='updateLookup', # Include full document on updates
max_await_time_ms=1000
)

10.6 Transaction Issues

Problem: Transaction aborted unexpectedly

Solution:

# Increase transaction timeout
with client.start_session() as session:
with session.start_transaction(
max_commit_time_ms=30000, # 30 second timeout
read_concern=ReadConcern("snapshot"),
write_concern=WriteConcern("majority")
):
# Transaction operations
pass

10.7 Debugging Tools

# Enable MongoDB driver debugging
import logging
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger('pymongo')
logger.setLevel(logging.DEBUG)
-- Enable HeliosDB query logging
ALTER SYSTEM SET log_statement = 'all';
ALTER SYSTEM SET log_min_duration_statement = 100; -- Log queries > 100ms
SELECT pg_reload_conf();
-- View recent logs
SELECT * FROM pg_stat_statements ORDER BY total_time DESC LIMIT 20;

Appendix A: Quick Reference Card

Connection String Mapping

MongoDBHeliosDB
mongodb://host:27017/mongodb://host:27017/
mongodb+srv://cluster/mongodb://heliosdb:27017/
authSource=adminauthSource=admin
replicaSet=rs0Not needed (built-in HA)

Common Query Translation

MongoDBHeliosDB SQL
find({})SELECT * FROM collection
find({a: 1})WHERE data->>'a' = '1'
find({a: {$gt: 1}})WHERE (data->>'a')::int > 1
find({a: {$in: [1,2]}})WHERE data->>'a' IN ('1','2')
updateOne({}, {$set: {a: 1}})UPDATE SET data = jsonb_set(data, '{a}', '1')
aggregate([{$group: ...}])SELECT ... GROUP BY ...

Useful HeliosDB JSON Functions

FunctionPurpose
data->>'field'Extract field as text
data->'field'Extract field as JSONB
data @> '{}'JSONB containment
data ? 'key'Key exists
jsonb_set()Set nested value
jsonb_array_elements()Unwind array

Appendix B: Migration Checklist Summary

Pre-Migration

  • Database inventory completed
  • Data volume estimated
  • GridFS usage identified and alternative planned
  • Backup created and verified
  • Test environment prepared

Migration

  • Schema designed (pure document, hybrid, or relational)
  • Data exported from MongoDB
  • Data imported to HeliosDB
  • Indexes recreated
  • Connection strings updated

Post-Migration

  • Document counts verified
  • Sample data validated
  • All queries tested
  • Performance benchmarked
  • Application tests passed
  • Change streams working
  • Monitoring configured


Last Updated: 2026-01-04 Migration Guide Version: 1.0 HeliosDB Compatibility: 7.0+