MongoDB to HeliosDB Migration Guide

Version: 1.0 Last Updated: 2026-01-04

Introduction
Compatibility Overview
Pre-Migration Checklist
Conceptual Mapping
Step-by-Step Migration Process
Query Translation Examples
Feature Mapping
Post-Migration Validation
Performance Considerations
Common Issues and Troubleshooting

1. Introduction

1.1 Why Migrate from MongoDB to HeliosDB?

HeliosDB offers significant advantages over MongoDB while maintaining wire protocol compatibility:

Aspect	MongoDB	HeliosDB
Data Model	Document-only	Unified multi-model (Document, Relational, Graph, Vector, Time-Series)
Query Language	MongoDB Query Language only	MQL, SQL, GraphQL, CQL, Redis commands
Transactions	Snapshot isolation	SERIALIZABLE isolation with zero-maintenance MVCC
Joins	$lookup only	Native SQL JOINs + $lookup
Analytics	Limited aggregation	Full OLAP + OLTP (HTAP)
Full-Text Search	External (Atlas Search)	Built-in with GraphRAG integration
Change Streams	Standard	CDC-backed with lower latency
Vector Search	Atlas Vector Search	Native HNSW with multi-modal embeddings
Licensing	SSPL	Enterprise-friendly
Sharding	Manual configuration	Transparent horizontal scaling

1.2 Migration Benefits

Unified Platform: Consolidate multiple database systems into one
SQL Access: Query MongoDB-style documents with standard SQL
Enhanced Analytics: Run complex analytical queries on document data
Better Transactions: ACID guarantees with serializable isolation
Cost Reduction: Single database license, reduced operational overhead
GraphRAG Integration: AI-powered semantic search on document content

1.3 Wire Protocol Compatibility

HeliosDB implements the MongoDB wire protocol (OP_MSG), allowing existing MongoDB applications to connect without code changes:

# Existing MongoDB code works unchanged
from pymongo import MongoClient

# Simply change the hostname
client = MongoClient("mongodb://heliosdb-server:27017/")
db = client.my_database
collection = db.my_collection

# All operations work as expected
result = collection.find({"status": "active"})

2. Compatibility Overview

2.1 Wire Protocol Support

Protocol Feature	Status	Notes
OP_MSG	100%	Modern protocol (MongoDB 3.6+)
OP_QUERY	100%	Legacy support
OP_INSERT/UPDATE/DELETE	100%	Legacy operations
BSON Encoding	100%	All 19 BSON types
SCRAM-SHA-256	100%	Recommended authentication
TLS/SSL	100%	Encrypted connections

2.2 CRUD Operations

Operation	Compatibility	Notes
insertOne/insertMany	100%	Full support
find/findOne	100%	All query operators
updateOne/updateMany	100%	All update operators
deleteOne/deleteMany	100%	Full support
replaceOne	100%	Document replacement
bulkWrite	100%	Ordered and unordered

2.3 Query Operators

Category	Coverage	Examples
Comparison	100%	$eq, $gt, $lt, $in, $nin
Logical	100%	$and, $or, $not, $nor
Element	100%	$exists, $type
Evaluation	100%	$regex, $text, $expr
Array	100%	$all, $elemMatch, $size
Geospatial	100%	$near, $geoWithin
Bitwise	100%	$bitsAllSet, $bitsAnyClear

2.4 Aggregation Pipeline

Stage	Status	Notes
$match	Supported	Filter documents
$project	Supported	Reshape documents
$group	Supported	Group with accumulators
$sort	Supported	Multi-field sorting
$lookup	Supported	Left outer join
$graphLookup	Supported	Recursive lookup
$unwind	Supported	Deconstruct arrays
$facet	Supported	Multi-faceted results
$setWindowFields	Supported	Window functions
$merge	Supported	Write to collection

2.5 Known Limitations

Feature	Status	Alternative
GridFS	Not supported	Use blob storage or S3 integration
Capped collections	Not supported	Use TTL indexes
Server-side JavaScript	Not supported	Use SQL procedures
Map-reduce	Deprecated	Use aggregation pipeline

3. Pre-Migration Checklist

3.1 Assessment Phase

Inventory databases and collections

# List all databases
mongosh --eval "db.adminCommand('listDatabases')"

# List collections in each database
mongosh mydb --eval "db.getCollectionNames()"

Document data volume

// Get collection statistics
db.collection.stats()

// Count documents
db.collection.countDocuments({})

Identify GridFS usage

// Check for GridFS collections
db.getCollectionNames().filter(n => n.endsWith('.files') || n.endsWith('.chunks'))

Review index definitions

// List all indexes
db.collection.getIndexes()

Analyze query patterns

// Enable profiler
db.setProfilingLevel(1, { slowms: 100 })

// Review slow queries
db.system.profile.find().sort({ ts: -1 }).limit(20)

3.2 Technical Requirements

HeliosDB version: 7.0 or later recommended
Storage capacity: 1.5x current MongoDB storage (for import headroom)
Network connectivity: MongoDB port 27017 accessible
Authentication: Prepare HeliosDB credentials
Backup verified: Recent MongoDB backup tested for restore

3.3 Application Assessment

Driver versions: Document current MongoDB driver versions
Connection strings: Inventory all application connection strings
Custom code: Review any server-side JavaScript or map-reduce usage
Change stream consumers: Document change stream applications
Test environment: Prepare isolated test environment

3.4 Risk Assessment

Risk	Mitigation
Data loss	Comprehensive backup before migration
Application downtime	Plan maintenance window or dual-write strategy
Query incompatibility	Pre-test all application queries
Performance regression	Benchmark before and after

4. Conceptual Mapping

4.1 Collections to Tables

MongoDB collections map to HeliosDB tables with JSONB storage:

MongoDB Collection:

// MongoDB: users collection
{
  "_id": ObjectId("507f1f77bcf86cd799439011"),
  "username": "johndoe",
  "email": "john@example.com",
  "profile": {
    "firstName": "John",
    "lastName": "Doe",
    "age": 30
  },
  "tags": ["developer", "mongodb"]
}

HeliosDB Table (Internal Representation):

-- Automatic table structure created by MongoDB protocol
CREATE TABLE users (
    _id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    data JSONB NOT NULL
);

-- The document is stored in the 'data' JSONB column
-- _id is extracted and indexed for fast lookups

Hybrid Schema (Optional Optimization):

-- For frequently queried fields, create hybrid schema
CREATE TABLE users (
    _id UUID PRIMARY KEY,
    username VARCHAR(100),
    email VARCHAR(255),
    profile JSONB,
    tags TEXT[],
    created_at TIMESTAMP DEFAULT NOW()
);

-- Create indexes on extracted fields
CREATE INDEX idx_users_username ON users(username);
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_users_tags ON users USING GIN(tags);

4.2 Documents to Rows with JSONB

MongoDB Concept	HeliosDB Equivalent
Document	Row with JSONB column
Embedded document	Nested JSONB object
Array	JSONB array or PostgreSQL array
ObjectId	UUID (auto-converted)
ISODate	TIMESTAMP/TIMESTAMPTZ
NumberLong	BIGINT
NumberDecimal	NUMERIC
BinData	BYTEA

Type Conversion Examples:

// MongoDB document with various types
{
  "_id": ObjectId("507f1f77bcf86cd799439011"),
  "price": NumberDecimal("19.99"),
  "quantity": NumberLong(100),
  "created": ISODate("2025-01-01T00:00:00Z"),
  "metadata": BinData(0, "base64data=="),
  "location": {
    "type": "Point",
    "coordinates": [-73.97, 40.77]
  }
}

-- HeliosDB JSONB representation
{
  "_id": "507f1f77-bcf8-6cd7-9943-9011507f1f77",
  "price": 19.99,
  "quantity": 100,
  "created": "2025-01-01T00:00:00Z",
  "metadata": "\\x626173653634646174613d3d",
  "location": {
    "type": "Point",
    "coordinates": [-73.97, 40.77]
  }
}

4.3 Indexes

MongoDB Index	HeliosDB Equivalent
Single field	B-tree index on JSONB path
Compound	Composite B-tree index
Unique	Unique constraint/index
TTL	TTL extension on JSONB
Text	GIN index with full-text search
2dsphere	GiST geospatial index
Hashed	Hash index
Wildcard	GIN index on JSONB

Index Migration Examples:

// MongoDB indexes
db.users.createIndex({ "email": 1 }, { unique: true })
db.users.createIndex({ "profile.age": 1, "status": 1 })
db.users.createIndex({ "location": "2dsphere" })
db.users.createIndex({ "content": "text", "title": "text" })
db.users.createIndex({ "createdAt": 1 }, { expireAfterSeconds: 3600 })

-- HeliosDB equivalent indexes
CREATE UNIQUE INDEX idx_users_email ON users ((data->>'email'));
CREATE INDEX idx_users_age_status ON users ((data->'profile'->>'age'), (data->>'status'));
CREATE INDEX idx_users_location ON users USING GIST ((data->'location'));
CREATE INDEX idx_users_content ON users USING GIN (to_tsvector('english',
    COALESCE(data->>'content', '') || ' ' || COALESCE(data->>'title', '')));
-- TTL implemented via background job or HeliosDB TTL extension

4.4 Database and Collection Namespaces

MongoDB	HeliosDB
Database	Schema
Collection	Table
mydb.users	mydb.users (schema.table)
admin	heliosdb_admin (system schema)

5. Step-by-Step Migration Process

5.1 Schema Design in HeliosDB

Option A: Pure Document Mode (Simplest)

-- HeliosDB automatically creates this structure
-- when accessed via MongoDB protocol
CREATE SCHEMA IF NOT EXISTS mydb;

CREATE TABLE mydb.users (
    _id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    data JSONB NOT NULL
);

CREATE INDEX idx_users_data ON mydb.users USING GIN (data jsonb_path_ops);

Option B: Hybrid Schema (Recommended for Performance)

-- Design schema with frequently queried fields extracted
CREATE TABLE mydb.users (
    _id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    username VARCHAR(100) NOT NULL,
    email VARCHAR(255) UNIQUE NOT NULL,
    status VARCHAR(20) DEFAULT 'active',
    profile JSONB,
    tags TEXT[],
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW()
);

-- Indexes for common query patterns
CREATE INDEX idx_users_status ON mydb.users(status);
CREATE INDEX idx_users_created ON mydb.users(created_at DESC);
CREATE INDEX idx_users_profile ON mydb.users USING GIN (profile);
CREATE INDEX idx_users_tags ON mydb.users USING GIN (tags);

Option C: Full Relational Conversion (Best for Analytics)

-- Normalize embedded documents into separate tables
CREATE TABLE mydb.users (
    _id UUID PRIMARY KEY,
    username VARCHAR(100) NOT NULL,
    email VARCHAR(255) UNIQUE NOT NULL,
    status VARCHAR(20) DEFAULT 'active',
    created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE mydb.user_profiles (
    user_id UUID PRIMARY KEY REFERENCES mydb.users(_id),
    first_name VARCHAR(100),
    last_name VARCHAR(100),
    age INTEGER,
    bio TEXT
);

CREATE TABLE mydb.user_tags (
    id SERIAL PRIMARY KEY,
    user_id UUID REFERENCES mydb.users(_id),
    tag VARCHAR(50)
);

CREATE INDEX idx_user_tags_user ON mydb.user_tags(user_id);
CREATE INDEX idx_user_tags_tag ON mydb.user_tags(tag);

5.2 Data Export Using mongodump/mongoexport

Method 1: mongodump (Binary, Recommended for Large Datasets)

# Export entire database
mongodump \
  --uri="mongodb://source-mongodb:27017/mydb" \
  --out=/backup/mongodb \
  --gzip

# Export specific collection
mongodump \
  --uri="mongodb://source-mongodb:27017/mydb" \
  --collection=users \
  --out=/backup/mongodb \
  --gzip

# Export with query filter (incremental)
mongodump \
  --uri="mongodb://source-mongodb:27017/mydb" \
  --collection=orders \
  --query='{"createdAt": {"$gte": {"$date": "2025-01-01T00:00:00Z"}}}' \
  --out=/backup/mongodb

Method 2: mongoexport (JSON/CSV, Good for Transformation)

# Export to JSON (extended JSON format)
mongoexport \
  --uri="mongodb://source-mongodb:27017/mydb" \
  --collection=users \
  --out=users.json \
  --jsonArray

# Export to CSV (for relational import)
mongoexport \
  --uri="mongodb://source-mongodb:27017/mydb" \
  --collection=users \
  --type=csv \
  --fields=_id,username,email,profile.firstName,profile.lastName \
  --out=users.csv

# Export with query filter
mongoexport \
  --uri="mongodb://source-mongodb:27017/mydb" \
  --collection=orders \
  --query='{"status": "completed"}' \
  --out=completed_orders.json

5.3 Data Transformation and Import

Direct Import via mongorestore (Fastest)

# Import directly to HeliosDB (uses MongoDB wire protocol)
mongorestore \
  --uri="mongodb://heliosdb-server:27017/mydb" \
  --gzip \
  /backup/mongodb/mydb

# Import specific collection
mongorestore \
  --uri="mongodb://heliosdb-server:27017/mydb" \
  --collection=users \
  --gzip \
  /backup/mongodb/mydb/users.bson.gz

# Import with options
mongorestore \
  --uri="mongodb://heliosdb-server:27017/mydb" \
  --drop \
  --numInsertionWorkers=4 \
  --batchSize=1000 \
  /backup/mongodb/mydb

JSON Import via SQL COPY

# Convert MongoDB extended JSON to standard JSON
cat users.json | jq -c '.[] | {_id: ._id."$oid", data: .}' > users_transformed.json

# Import into HeliosDB using COPY
psql -h heliosdb-server -U admin -d mydb -c "
COPY users (data) FROM STDIN WITH (FORMAT json);
" < users_transformed.json

Python Migration Script (Complex Transformations)

#!/usr/bin/env python3
"""
MongoDB to HeliosDB Migration Script
Handles complex data transformations during migration
"""

from pymongo import MongoClient
from datetime import datetime
import uuid
import json

# Source MongoDB
source_client = MongoClient("mongodb://source-mongodb:27017/")
source_db = source_client.mydb

# Target HeliosDB (via MongoDB protocol)
target_client = MongoClient("mongodb://heliosdb-server:27017/")
target_db = target_client.mydb

def convert_objectid(oid):
    """Convert MongoDB ObjectId to UUID format"""
    hex_str = str(oid)
    # Pad to 32 characters for UUID
    padded = hex_str.ljust(32, '0')
    return uuid.UUID(padded[:8] + '-' + padded[8:12] + '-' +
                     padded[12:16] + '-' + padded[16:20] + '-' + padded[20:32])

def transform_document(doc):
    """Transform document for HeliosDB compatibility"""
    transformed = {}

    for key, value in doc.items():
        if key == '_id':
            # Convert ObjectId to UUID string
            transformed['_id'] = str(convert_objectid(value))
        elif isinstance(value, datetime):
            # Convert datetime to ISO string
            transformed[key] = value.isoformat()
        elif isinstance(value, dict):
            # Recursively transform nested documents
            transformed[key] = transform_document(value)
        elif isinstance(value, list):
            # Transform array elements
            transformed[key] = [
                transform_document(v) if isinstance(v, dict) else v
                for v in value
            ]
        else:
            transformed[key] = value

    return transformed

def migrate_collection(collection_name, batch_size=1000):
    """Migrate a single collection with batching"""
    source_collection = source_db[collection_name]
    target_collection = target_db[collection_name]

    total = source_collection.count_documents({})
    migrated = 0

    print(f"Migrating {collection_name}: {total} documents")

    cursor = source_collection.find().batch_size(batch_size)
    batch = []

    for doc in cursor:
        transformed = transform_document(doc)
        batch.append(transformed)

        if len(batch) >= batch_size:
            target_collection.insert_many(batch, ordered=False)
            migrated += len(batch)
            print(f"  Progress: {migrated}/{total} ({100*migrated/total:.1f}%)")
            batch = []

    # Insert remaining documents
    if batch:
        target_collection.insert_many(batch, ordered=False)
        migrated += len(batch)

    print(f"  Completed: {migrated} documents migrated")
    return migrated

def migrate_indexes(collection_name):
    """Recreate indexes in target collection"""
    source_collection = source_db[collection_name]
    target_collection = target_db[collection_name]

    indexes = source_collection.index_information()

    for name, info in indexes.items():
        if name == '_id_':
            continue  # Skip default _id index

        keys = info['key']
        options = {k: v for k, v in info.items()
                   if k not in ('key', 'v', 'ns')}

        try:
            target_collection.create_index(keys, name=name, **options)
            print(f"  Created index: {name}")
        except Exception as e:
            print(f"  Warning: Could not create index {name}: {e}")

def main():
    collections = source_db.list_collection_names()

    # Filter out system collections
    collections = [c for c in collections
                   if not c.startswith('system.')]

    print(f"Found {len(collections)} collections to migrate")

    for collection_name in collections:
        migrate_collection(collection_name)
        migrate_indexes(collection_name)

    print("\nMigration complete!")

if __name__ == "__main__":
    main()

5.4 Application Connection Changes

Python (PyMongo)

# Before: MongoDB
from pymongo import MongoClient
client = MongoClient("mongodb://mongodb-server:27017/")

# After: HeliosDB (same code, different hostname)
from pymongo import MongoClient
client = MongoClient("mongodb://heliosdb-server:27017/")

# With authentication
client = MongoClient(
    "mongodb://heliosdb-server:27017/",
    username="myuser",
    password="mypassword",
    authSource="admin",
    authMechanism="SCRAM-SHA-256"
)

# With connection pooling and options
client = MongoClient(
    "mongodb://heliosdb-server:27017/",
    maxPoolSize=50,
    minPoolSize=10,
    maxIdleTimeMS=30000,
    serverSelectionTimeoutMS=5000
)

Node.js (mongodb driver)

// Before: MongoDB
const { MongoClient } = require('mongodb');
const client = new MongoClient('mongodb://mongodb-server:27017/');

// After: HeliosDB
const client = new MongoClient('mongodb://heliosdb-server:27017/', {
    auth: {
        username: 'myuser',
        password: 'mypassword'
    },
    authSource: 'admin',
    maxPoolSize: 50
});

Java (MongoDB Driver)

// Before: MongoDB
MongoClient client = MongoClients.create("mongodb://mongodb-server:27017/");

// After: HeliosDB
MongoClientSettings settings = MongoClientSettings.builder()
    .applyConnectionString(new ConnectionString("mongodb://heliosdb-server:27017/"))
    .credential(MongoCredential.createScramSha256Credential(
        "myuser", "admin", "mypassword".toCharArray()))
    .build();
MongoClient client = MongoClients.create(settings);

Environment-Based Configuration

import os

# Use environment variable for easy switching
MONGODB_URI = os.environ.get(
    "MONGODB_URI",
    "mongodb://heliosdb-server:27017/"
)

client = MongoClient(MONGODB_URI)

6. Query Translation Examples

6.1 find() to SELECT

Basic Find

// MongoDB
db.users.find({ status: "active" })

-- HeliosDB SQL equivalent
SELECT * FROM users WHERE data->>'status' = 'active';

-- Or with hybrid schema
SELECT * FROM users WHERE status = 'active';

Find with Projection

// MongoDB
db.users.find(
    { status: "active" },
    { username: 1, email: 1, _id: 0 }
)

-- HeliosDB SQL
SELECT
    data->>'username' AS username,
    data->>'email' AS email
FROM users
WHERE data->>'status' = 'active';

Find with Operators

// MongoDB: comparison operators
db.orders.find({
    amount: { $gte: 100, $lte: 500 },
    status: { $in: ["pending", "processing"] }
})

-- HeliosDB SQL
SELECT * FROM orders
WHERE (data->>'amount')::numeric BETWEEN 100 AND 500
  AND data->>'status' IN ('pending', 'processing');

Find with Nested Fields

// MongoDB: nested document query
db.users.find({ "profile.age": { $gte: 21 } })

-- HeliosDB SQL
SELECT * FROM users
WHERE (data->'profile'->>'age')::integer >= 21;

-- Using jsonpath (more efficient)
SELECT * FROM users
WHERE data @@ '$.profile.age >= 21';

Find with Arrays

// MongoDB: array queries
db.users.find({ tags: "mongodb" })  // contains
db.users.find({ tags: { $all: ["mongodb", "developer"] } })  // contains all
db.users.find({ "items.price": { $gt: 100 } })  // array element match

-- HeliosDB SQL: array contains
SELECT * FROM users WHERE data->'tags' ? 'mongodb';

-- Contains all
SELECT * FROM users
WHERE data->'tags' ?& array['mongodb', 'developer'];

-- Array element match
SELECT * FROM users
WHERE EXISTS (
    SELECT 1 FROM jsonb_array_elements(data->'items') AS item
    WHERE (item->>'price')::numeric > 100
);

6.2 aggregate() to SQL with JSON Functions

Group and Count

// MongoDB aggregation
db.orders.aggregate([
    { $match: { status: "completed" } },
    { $group: {
        _id: "$category",
        total: { $sum: "$amount" },
        count: { $sum: 1 },
        avgAmount: { $avg: "$amount" }
    }},
    { $sort: { total: -1 } },
    { $limit: 10 }
])

-- HeliosDB SQL
SELECT
    data->>'category' AS category,
    SUM((data->>'amount')::numeric) AS total,
    COUNT(*) AS count,
    AVG((data->>'amount')::numeric) AS avg_amount
FROM orders
WHERE data->>'status' = 'completed'
GROUP BY data->>'category'
ORDER BY total DESC
LIMIT 10;

Lookup (Join)

// MongoDB $lookup
db.orders.aggregate([
    { $lookup: {
        from: "customers",
        localField: "customer_id",
        foreignField: "_id",
        as: "customer"
    }},
    { $unwind: "$customer" }
])

-- HeliosDB SQL (much simpler with native JOINs)
SELECT
    o.*,
    c.data AS customer
FROM orders o
JOIN customers c ON o.data->>'customer_id' = c._id::text;

-- Or with JSONB aggregation to mimic $unwind result
SELECT
    o.data || jsonb_build_object('customer', c.data) AS document
FROM orders o
JOIN customers c ON o.data->>'customer_id' = c._id::text;

Faceted Search

// MongoDB $facet
db.products.aggregate([
    { $facet: {
        byCategory: [
            { $group: { _id: "$category", count: { $sum: 1 } } }
        ],
        priceRange: [
            { $bucket: {
                groupBy: "$price",
                boundaries: [0, 50, 100, 500],
                default: "Other"
            }}
        ],
        totalCount: [
            { $count: "count" }
        ]
    }}
])

-- HeliosDB SQL: Use separate queries or WITH clause
WITH
by_category AS (
    SELECT data->>'category' AS category, COUNT(*) AS count
    FROM products
    GROUP BY data->>'category'
),
price_range AS (
    SELECT
        CASE
            WHEN (data->>'price')::numeric < 50 THEN '0-50'
            WHEN (data->>'price')::numeric < 100 THEN '50-100'
            WHEN (data->>'price')::numeric < 500 THEN '100-500'
            ELSE 'Other'
        END AS range,
        COUNT(*) AS count
    FROM products
    GROUP BY 1
),
total AS (
    SELECT COUNT(*) AS count FROM products
)
SELECT
    jsonb_build_object(
        'byCategory', (SELECT jsonb_agg(row_to_json(by_category)) FROM by_category),
        'priceRange', (SELECT jsonb_agg(row_to_json(price_range)) FROM price_range),
        'totalCount', (SELECT count FROM total)
    ) AS facets;

6.3 update() to UPDATE with JSONB Operators

Basic Update

// MongoDB
db.users.updateOne(
    { _id: ObjectId("...") },
    { $set: { status: "active" } }
)

-- HeliosDB SQL
UPDATE users
SET data = jsonb_set(data, '{status}', '"active"')
WHERE _id = '...'::uuid;

-- With hybrid schema
UPDATE users SET status = 'active' WHERE _id = '...'::uuid;

Increment

// MongoDB
db.products.updateMany(
    { category: "electronics" },
    { $inc: { viewCount: 1, stock: -1 } }
)

-- HeliosDB SQL
UPDATE products
SET data = data || jsonb_build_object(
    'viewCount', COALESCE((data->>'viewCount')::integer, 0) + 1,
    'stock', COALESCE((data->>'stock')::integer, 0) - 1
)
WHERE data->>'category' = 'electronics';

Array Push

// MongoDB
db.users.updateOne(
    { _id: ObjectId("...") },
    { $push: { tags: "premium" } }
)

-- HeliosDB SQL
UPDATE users
SET data = jsonb_set(
    data,
    '{tags}',
    COALESCE(data->'tags', '[]'::jsonb) || '"premium"'::jsonb
)
WHERE _id = '...'::uuid;

Array Pull

// MongoDB
db.users.updateOne(
    { _id: ObjectId("...") },
    { $pull: { tags: "temporary" } }
)

-- HeliosDB SQL
UPDATE users
SET data = jsonb_set(
    data,
    '{tags}',
    (SELECT jsonb_agg(elem)
     FROM jsonb_array_elements(data->'tags') AS elem
     WHERE elem != '"temporary"')
)
WHERE _id = '...'::uuid;

Upsert

// MongoDB
db.users.updateOne(
    { email: "new@example.com" },
    { $set: { name: "New User", createdAt: new Date() } },
    { upsert: true }
)

-- HeliosDB SQL
INSERT INTO users (data)
VALUES ('{"email": "new@example.com", "name": "New User", "createdAt": "..."}'::jsonb)
ON CONFLICT ((data->>'email'))
DO UPDATE SET data = users.data || EXCLUDED.data;

7. Feature Mapping

7.1 Aggregation Pipeline Equivalents

MongoDB Stage	HeliosDB SQL Equivalent
$match	WHERE clause
$project	SELECT with JSON functions
$group	GROUP BY with aggregates
$sort	ORDER BY
$limit	LIMIT
$skip	OFFSET
$unwind	LATERAL jsonb_array_elements()
$lookup	JOIN
$graphLookup	WITH RECURSIVE
$bucket	CASE WHEN in GROUP BY
$facet	Multiple CTEs
$setWindowFields	Window functions

7.2 Change Streams vs HeliosDB CDC

Feature	MongoDB Change Streams	HeliosDB CDC
Protocol	Native MongoDB	Native + Debezium-compatible
Latency	~100ms	<10ms
Resume	Token-based	LSN-based (more reliable)
Filtering	Pipeline stages	SQL predicates
Output	BSON	JSON, Avro, Protobuf
Destinations	Application code	Kafka, Webhooks, S3, etc.

MongoDB Change Stream:

const pipeline = [
    { $match: { operationType: "insert" } }
];

const changeStream = collection.watch(pipeline);
changeStream.on("change", (change) => {
    console.log(change);
});

HeliosDB CDC (via MongoDB protocol - identical API):

// Same API works with HeliosDB
const changeStream = collection.watch(pipeline);
changeStream.on("change", (change) => {
    console.log(change);
});

HeliosDB CDC (direct SQL subscription):

-- Create CDC subscription
CREATE PUBLICATION users_changes FOR TABLE users;

-- Subscribe via logical replication
CREATE SUBSCRIPTION users_sub
    CONNECTION 'host=heliosdb-server dbname=mydb'
    PUBLICATION users_changes;

HeliosDB Webhook CDC:

-- Configure webhook notification
SELECT helios_cdc.create_webhook(
    'users',
    'https://myapp.com/webhooks/user-changes',
    '{"events": ["INSERT", "UPDATE", "DELETE"]}'
);

7.3 Geospatial Queries

MongoDB	HeliosDB
2dsphere index	GiST index with geography
$near	ST_DWithin + ORDER BY ST_Distance
$geoWithin	ST_Within
$geoIntersects	ST_Intersects
GeoJSON	Native GeoJSON support

MongoDB Geospatial:

// Create 2dsphere index
db.places.createIndex({ location: "2dsphere" })

// Find near point
db.places.find({
    location: {
        $near: {
            $geometry: { type: "Point", coordinates: [-73.97, 40.77] },
            $maxDistance: 1000
        }
    }
})

// Find within polygon
db.places.find({
    location: {
        $geoWithin: {
            $geometry: {
                type: "Polygon",
                coordinates: [[[-73.98, 40.76], [-73.96, 40.76], [-73.96, 40.78], [-73.98, 40.78], [-73.98, 40.76]]]
            }
        }
    }
})

HeliosDB SQL Geospatial:

-- Create spatial index
CREATE INDEX idx_places_location ON places
    USING GIST (ST_GeomFromGeoJSON(data->>'location'));

-- Find near point (within 1000 meters)
SELECT * FROM places
WHERE ST_DWithin(
    ST_GeomFromGeoJSON(data->>'location')::geography,
    ST_SetSRID(ST_MakePoint(-73.97, 40.77), 4326)::geography,
    1000
)
ORDER BY ST_Distance(
    ST_GeomFromGeoJSON(data->>'location')::geography,
    ST_SetSRID(ST_MakePoint(-73.97, 40.77), 4326)::geography
);

-- Find within polygon
SELECT * FROM places
WHERE ST_Within(
    ST_GeomFromGeoJSON(data->>'location'),
    ST_GeomFromGeoJSON('{
        "type": "Polygon",
        "coordinates": [[[-73.98, 40.76], [-73.96, 40.76], [-73.96, 40.78], [-73.98, 40.78], [-73.98, 40.76]]]
    }')
);

7.4 Full-Text Search

MongoDB	HeliosDB
Text index	GIN index with tsvector
$text search	to_tsvector/to_tsquery
Score sorting	ts_rank
Language support	30+ languages
Stemming	Built-in

MongoDB Text Search:

// Create text index
db.articles.createIndex({ title: "text", content: "text" })

// Search
db.articles.find(
    { $text: { $search: "mongodb database" } },
    { score: { $meta: "textScore" } }
).sort({ score: { $meta: "textScore" } })

HeliosDB Full-Text Search:

-- Create full-text index
CREATE INDEX idx_articles_fts ON articles
    USING GIN (to_tsvector('english',
        COALESCE(data->>'title', '') || ' ' || COALESCE(data->>'content', '')));

-- Search with ranking
SELECT
    *,
    ts_rank(
        to_tsvector('english', COALESCE(data->>'title', '') || ' ' || COALESCE(data->>'content', '')),
        plainto_tsquery('english', 'mongodb database')
    ) AS score
FROM articles
WHERE to_tsvector('english', COALESCE(data->>'title', '') || ' ' || COALESCE(data->>'content', ''))
    @@ plainto_tsquery('english', 'mongodb database')
ORDER BY score DESC;

HeliosDB GraphRAG Semantic Search:

-- Enable semantic search (AI-powered)
SELECT helios_graphrag.enable_semantic_search('articles', 'content');

-- Semantic search
SELECT * FROM helios_graphrag.semantic_search(
    'articles',
    'How do document databases handle transactions?',
    10  -- top 10 results
);

8. Post-Migration Validation

8.1 Data Integrity Checks

// Compare document counts
// Source MongoDB
const sourceCount = db.users.countDocuments({});
console.log(`Source count: ${sourceCount}`);

// Target HeliosDB
const targetCount = db.users.countDocuments({});
console.log(`Target count: ${targetCount}`);

// Verify counts match
if (sourceCount !== targetCount) {
    console.error(`Count mismatch! Source: ${sourceCount}, Target: ${targetCount}`);
}

def validate_migration(source_db, target_db, collection_name, sample_size=1000):
    """Validate migrated data integrity"""
    source = source_db[collection_name]
    target = target_db[collection_name]

    # Count validation
    source_count = source.count_documents({})
    target_count = target.count_documents({})

    if source_count != target_count:
        print(f"ERROR: Count mismatch - Source: {source_count}, Target: {target_count}")
        return False

    # Sample validation
    sample = list(source.aggregate([{"$sample": {"size": sample_size}}]))

    for doc in sample:
        target_doc = target.find_one({"_id": doc["_id"]})
        if target_doc is None:
            print(f"ERROR: Document {doc['_id']} not found in target")
            return False

        # Compare key fields (adjust based on your schema)
        for key in ["username", "email", "status"]:
            if doc.get(key) != target_doc.get(key):
                print(f"ERROR: Field {key} mismatch for {doc['_id']}")
                return False

    print(f"Validation passed: {source_count} documents verified")
    return True

8.2 Query Validation

def validate_queries(source_db, target_db, collection_name):
    """Run test queries on both databases and compare results"""

    test_queries = [
        # Basic find
        {"status": "active"},
        # Range query
        {"created_at": {"$gte": datetime(2025, 1, 1)}},
        # Nested field
        {"profile.age": {"$gte": 21}},
        # Array query
        {"tags": {"$in": ["premium", "vip"]}},
    ]

    source = source_db[collection_name]
    target = target_db[collection_name]

    for query in test_queries:
        source_results = list(source.find(query).limit(100))
        target_results = list(target.find(query).limit(100))

        if len(source_results) != len(target_results):
            print(f"Query result count mismatch for {query}")
            print(f"  Source: {len(source_results)}, Target: {len(target_results)}")
        else:
            print(f"Query {query}: OK ({len(source_results)} results)")

8.3 Index Validation

// Compare indexes
function compareIndexes(sourceDb, targetDb, collectionName) {
    const sourceIndexes = sourceDb[collectionName].getIndexes();
    const targetIndexes = targetDb[collectionName].getIndexes();

    const sourceNames = sourceIndexes.map(i => i.name).sort();
    const targetNames = targetIndexes.map(i => i.name).sort();

    console.log("Source indexes:", sourceNames);
    console.log("Target indexes:", targetNames);

    // Check for missing indexes
    for (const name of sourceNames) {
        if (!targetNames.includes(name)) {
            console.log(`WARNING: Index ${name} missing in target`);
        }
    }
}

8.4 Performance Validation

import time

def benchmark_queries(db, collection_name, iterations=100):
    """Benchmark common query patterns"""
    collection = db[collection_name]

    benchmarks = {
        "find_by_id": lambda: collection.find_one({"_id": sample_id}),
        "find_by_index": lambda: list(collection.find({"status": "active"}).limit(10)),
        "aggregation": lambda: list(collection.aggregate([
            {"$match": {"status": "active"}},
            {"$group": {"_id": "$category", "count": {"$sum": 1}}}
        ])),
        "text_search": lambda: list(collection.find(
            {"$text": {"$search": "search terms"}}
        ).limit(10)),
    }

    results = {}
    for name, query_func in benchmarks.items():
        times = []
        for _ in range(iterations):
            start = time.time()
            query_func()
            times.append(time.time() - start)

        results[name] = {
            "avg_ms": sum(times) / len(times) * 1000,
            "min_ms": min(times) * 1000,
            "max_ms": max(times) * 1000
        }
        print(f"{name}: avg={results[name]['avg_ms']:.2f}ms")

    return results

8.5 Application Testing Checklist

Unit tests pass against HeliosDB
Integration tests pass with new connection strings
CRUD operations work correctly
Aggregation pipelines return expected results
Change streams receive events
Transactions commit and rollback properly
Authentication works with all user accounts
Connection pooling handles load correctly
Error handling catches expected exceptions

9. Performance Considerations

9.1 Index Optimization

-- Analyze query patterns
EXPLAIN ANALYZE SELECT * FROM users WHERE data->>'status' = 'active';

-- Create expression indexes for frequent queries
CREATE INDEX idx_users_status ON users ((data->>'status'));
CREATE INDEX idx_users_email ON users ((data->>'email'));

-- Create GIN index for flexible JSON queries
CREATE INDEX idx_users_data ON users USING GIN (data jsonb_path_ops);

-- For range queries on numeric fields
CREATE INDEX idx_orders_amount ON orders (((data->>'amount')::numeric));

-- Composite index for multi-field queries
CREATE INDEX idx_orders_status_date ON orders (
    (data->>'status'),
    ((data->>'created_at')::timestamptz)
);

9.2 Query Optimization

-- Use JSONB containment for exact matches (uses GIN index)
SELECT * FROM users WHERE data @> '{"status": "active"}';

-- Use jsonpath for complex queries
SELECT * FROM users
WHERE data @@ '$.profile.age > 21 && $.tags[*] == "premium"';

-- Avoid full table scans with proper indexes
-- BAD: No index usage
SELECT * FROM users WHERE data->>'name' ILIKE '%john%';

-- GOOD: Use full-text search
SELECT * FROM users
WHERE to_tsvector('english', data->>'name') @@ to_tsquery('john');

9.3 Bulk Operations

# Batch inserts for better performance
def bulk_insert(collection, documents, batch_size=1000):
    """Insert documents in batches for optimal performance"""
    for i in range(0, len(documents), batch_size):
        batch = documents[i:i + batch_size]
        collection.insert_many(batch, ordered=False)
        print(f"Inserted batch {i//batch_size + 1}")

-- Use COPY for large data imports (SQL mode)
COPY users (data) FROM '/path/to/data.json' WITH (FORMAT json);

-- Disable indexes during bulk load, rebuild after
ALTER INDEX idx_users_data DISABLE;
-- ... bulk insert ...
REINDEX INDEX idx_users_data;

9.4 Connection Pooling

# Configure connection pooling
from pymongo import MongoClient

client = MongoClient(
    "mongodb://heliosdb-server:27017/",
    maxPoolSize=100,      # Maximum connections
    minPoolSize=10,       # Minimum connections to maintain
    maxIdleTimeMS=30000,  # Close idle connections after 30s
    waitQueueTimeoutMS=5000,  # Timeout waiting for connection
    socketTimeoutMS=30000,    # Socket operation timeout
    serverSelectionTimeoutMS=5000  # Server selection timeout
)

9.5 Memory and Cache Configuration

-- Configure HeliosDB for document workloads
ALTER SYSTEM SET work_mem = '256MB';  -- For complex aggregations
ALTER SYSTEM SET shared_buffers = '8GB';  -- For caching
ALTER SYSTEM SET effective_cache_size = '24GB';  -- Query planner hint

-- Enable intelligent caching for hot collections
ALTER TABLE users ENABLE INTELLIGENT_CACHING;

10. Common Issues and Troubleshooting

10.1 Connection Issues

Problem: Connection refused

pymongo.errors.ServerSelectionTimeoutError: heliosdb-server:27017: [Errno 111] Connection refused

Solution:

# Check if HeliosDB is listening on MongoDB port
netstat -tlnp | grep 27017

# Verify MongoDB protocol is enabled
psql -c "SHOW helios.mongodb_port;"

# Check firewall rules
sudo firewall-cmd --list-ports | grep 27017

Problem: Authentication failed

pymongo.errors.OperationFailure: Authentication failed

Solution:

# Ensure correct auth mechanism
client = MongoClient(
    "mongodb://user:password@heliosdb-server:27017/",
    authMechanism="SCRAM-SHA-256",  # Ensure this matches server config
    authSource="admin"
)

10.2 Query Compatibility Issues

Problem: Unsupported operator

pymongo.errors.OperationFailure: Unknown operator: $customOp

Solution:

# Check compatibility matrix, use alternative
# Instead of custom operators, use $expr with aggregation expressions

# Instead of
db.collection.find({"$customOp": value})

# Use
db.collection.find({"$expr": {"$eq": ["$field", value]}})

Problem: Query returns different results

Solution:

# Check for type mismatches
# MongoDB may have stored numbers as strings

# Debug: Check actual stored type
doc = collection.find_one()
print(type(doc.get('amount')))  # Check if string vs number

# Fix: Ensure consistent types during migration
if isinstance(doc['amount'], str):
    collection.update_one(
        {"_id": doc["_id"]},
        {"$set": {"amount": float(doc['amount'])}}
    )

10.3 Performance Issues

Problem: Slow queries after migration

Solution:

-- Check if indexes exist
SELECT indexname, indexdef
FROM pg_indexes
WHERE tablename = 'users';

-- Analyze table statistics
ANALYZE users;

-- Check query plan
EXPLAIN ANALYZE SELECT * FROM users WHERE data->>'status' = 'active';

-- Create missing indexes
CREATE INDEX CONCURRENTLY idx_users_status ON users ((data->>'status'));

Problem: High memory usage

Solution:

-- Check for large documents
SELECT pg_size_pretty(pg_total_relation_size('users')) AS total_size;

-- Find large documents
SELECT _id, pg_column_size(data) AS size
FROM users
ORDER BY size DESC
LIMIT 10;

-- Consider splitting large documents or using hybrid schema

10.4 Data Type Issues

Problem: ObjectId conversion errors

Solution:

from bson import ObjectId
import uuid

def objectid_to_uuid(oid):
    """Convert MongoDB ObjectId to UUID"""
    hex_str = str(oid)
    # Pad to 32 hex characters
    padded = hex_str.ljust(32, '0')
    return uuid.UUID(padded[:8] + '-' + padded[8:12] + '-' +
                     padded[12:16] + '-' + padded[16:20] + '-' + padded[20:32])

Problem: Date/time precision loss

Solution:

# MongoDB stores dates as milliseconds since epoch
# Ensure proper conversion

from datetime import datetime

def convert_date(mongo_date):
    """Preserve date precision during migration"""
    if isinstance(mongo_date, datetime):
        return mongo_date.isoformat()
    return mongo_date

10.5 Change Stream Issues

Problem: Change stream not receiving events

Solution:

# Ensure collection has proper CDC enabled
# Check via SQL
# SELECT * FROM pg_publication_tables WHERE pubname = 'mongodb_cdc';

# Restart change stream with explicit options
change_stream = collection.watch(
    full_document='updateLookup',  # Include full document on updates
    max_await_time_ms=1000
)

10.6 Transaction Issues

Problem: Transaction aborted unexpectedly

Solution:

# Increase transaction timeout
with client.start_session() as session:
    with session.start_transaction(
        max_commit_time_ms=30000,  # 30 second timeout
        read_concern=ReadConcern("snapshot"),
        write_concern=WriteConcern("majority")
    ):
        # Transaction operations
        pass

10.7 Debugging Tools

# Enable MongoDB driver debugging
import logging
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger('pymongo')
logger.setLevel(logging.DEBUG)

-- Enable HeliosDB query logging
ALTER SYSTEM SET log_statement = 'all';
ALTER SYSTEM SET log_min_duration_statement = 100;  -- Log queries > 100ms
SELECT pg_reload_conf();

-- View recent logs
SELECT * FROM pg_stat_statements ORDER BY total_time DESC LIMIT 20;

Appendix A: Quick Reference Card

Connection String Mapping

MongoDB	HeliosDB
`mongodb://host:27017/`	`mongodb://host:27017/`
`mongodb+srv://cluster/`	`mongodb://heliosdb:27017/`
`authSource=admin`	`authSource=admin`
`replicaSet=rs0`	Not needed (built-in HA)

Common Query Translation

MongoDB	HeliosDB SQL
`find({})`	`SELECT * FROM collection`
`find({a: 1})`	`WHERE data->>'a' = '1'`
`find({a: {$gt: 1}})`	`WHERE (data->>'a')::int > 1`
`find({a: {$in: [1,2]}})`	`WHERE data->>'a' IN ('1','2')`
`updateOne({}, {$set: {a: 1}})`	`UPDATE SET data = jsonb_set(data, '{a}', '1')`
`aggregate([{$group: ...}])`	`SELECT ... GROUP BY ...`

Useful HeliosDB JSON Functions

Function	Purpose
`data->>'field'`	Extract field as text
`data->'field'`	Extract field as JSONB
`data @> '{}'`	JSONB containment
`data ? 'key'`	Key exists
`jsonb_set()`	Set nested value
`jsonb_array_elements()`	Unwind array

Appendix B: Migration Checklist Summary

Pre-Migration

Database inventory completed
Data volume estimated
GridFS usage identified and alternative planned
Backup created and verified
Test environment prepared

Migration

Schema designed (pure document, hybrid, or relational)
Data exported from MongoDB
Data imported to HeliosDB
Indexes recreated
Connection strings updated

Post-Migration

MongoDB Protocol README - Protocol overview
MongoDB Compatibility Matrix - Feature support details
MongoDB Examples - Code examples
General Migration Guide - Cross-protocol migration
MySQL Migration Guide - SQL migration reference

Last Updated: 2026-01-04 Migration Guide Version: 1.0 HeliosDB Compatibility: 7.0+

MongoDB to HeliosDB Migration Guide

MongoDB to HeliosDB Migration Guide

Table of Contents

1. Introduction

1.1 Why Migrate from MongoDB to HeliosDB?

1.2 Migration Benefits

1.3 Wire Protocol Compatibility

2. Compatibility Overview

2.1 Wire Protocol Support

2.2 CRUD Operations

2.3 Query Operators

2.4 Aggregation Pipeline

2.5 Known Limitations

3. Pre-Migration Checklist

3.1 Assessment Phase

3.2 Technical Requirements

3.3 Application Assessment

3.4 Risk Assessment

4. Conceptual Mapping

4.1 Collections to Tables

4.2 Documents to Rows with JSONB

4.3 Indexes

4.4 Database and Collection Namespaces

5. Step-by-Step Migration Process

5.1 Schema Design in HeliosDB

Option A: Pure Document Mode (Simplest)

Option B: Hybrid Schema (Recommended for Performance)

Option C: Full Relational Conversion (Best for Analytics)

5.2 Data Export Using mongodump/mongoexport

Method 1: mongodump (Binary, Recommended for Large Datasets)

Method 2: mongoexport (JSON/CSV, Good for Transformation)

5.3 Data Transformation and Import

Direct Import via mongorestore (Fastest)

JSON Import via SQL COPY

Python Migration Script (Complex Transformations)

5.4 Application Connection Changes

Python (PyMongo)

Node.js (mongodb driver)

Java (MongoDB Driver)

Environment-Based Configuration

6. Query Translation Examples

6.1 find() to SELECT

Basic Find

Find with Projection

Find with Operators

Find with Nested Fields

Find with Arrays

6.2 aggregate() to SQL with JSON Functions

Group and Count

Lookup (Join)

Faceted Search

6.3 update() to UPDATE with JSONB Operators

Basic Update

Increment

Array Push

Array Pull

Upsert

7. Feature Mapping

7.1 Aggregation Pipeline Equivalents

7.2 Change Streams vs HeliosDB CDC

7.3 Geospatial Queries

7.4 Full-Text Search

8. Post-Migration Validation

8.1 Data Integrity Checks

8.2 Query Validation

8.3 Index Validation

8.4 Performance Validation

8.5 Application Testing Checklist

9. Performance Considerations

9.1 Index Optimization

9.2 Query Optimization

9.3 Bulk Operations

9.4 Connection Pooling

9.5 Memory and Cache Configuration

10. Common Issues and Troubleshooting

10.1 Connection Issues

10.2 Query Compatibility Issues

10.3 Performance Issues

10.4 Data Type Issues

10.5 Change Stream Issues