MinIO AIStor is MinIO's enterprise rebrand -- the same S3-compatible object store with Iceberg table support, webhook-based "AI hooks" (promptObject), and integrations to external ML services. It positions itself as "AI storage infrastructure." HeliosDB is a unified AI-native database that embeds SQL, vector search, RAG pipelines, ML training, anomaly detection, and autonomous agents directly inside the engine. The difference is fundamental: AIStor connects to AI tools; HeliosDB is an AI tool.


Quick Comparison

Feature MinIO AIStor HeliosDB
Primary purposeS3 object store + Iceberg tablesUnified AI-native database
Query languageS3 Select (single-file filtering)Full SQL (JOINs, CTEs, window functions, subqueries)
Vector searchExternal (Milvus, LanceDB)Native HNSW + IVF + multimodal, SIMD-accelerated
RAG pipelinespromptObject hooks to external LLMsNative 6-strategy chunking + retrieval + reranking + LLM
ML model trainingExternal (PyTorch, TensorFlow)Built-in 10+ algorithms (regression, classification, clustering, anomaly)
Autonomous agentsNone5 GOAP-based cognitive agents (tuning, healing, quality, optimizer, security)
Anomaly detectionNoneNative z-score, isolation forest, drift detection
Natural language queriesNoneNL2SQL + conversational BI
ML-based data tieringManual lifecycle rules62-feature access pattern model, auto hot→warm→cold
Threat detectionNonePattern + behavioral analysis on ingest
Protocol adaptersS3 + Iceberg REST catalog14 protocols (PG, MySQL, Redis, MongoDB, Oracle, SQL Server, Cassandra...)
ACID transactionsSingle-object atomicityMulti-statement MVCC (Serializable isolation)
Triggers / UDFsNone (webhook-based events)PL/pgSQL triggers + WASM user-defined functions
Graph databaseNoneCypher + GQL with property graph model
Time-travelObject versioning (linear)MVCC snapshots + git-like branching + merge
EncryptionSSE-S3 / SSE-KMS / SSE-CTDE + Zero-Knowledge Encryption + post-quantum readiness
ComplianceBasic (SOC2 via enterprise contract)GDPR / HIPAA / SOC2 / PCI-DSS / FedRAMP built-in controls
Deployment4+ nodes minimum for productionSingle binary to sharded cluster
Pricing~$96K/year enterprise licenseOpen source + $5K-$20K managed tier

"AI-Native" -- What It Actually Means

MinIO AIStor calls itself "AI storage infrastructure." In practice, this means it stores objects and fires webhooks to external AI services. Every intelligence capability -- embeddings, search, training, inference -- lives in a separate process that you deploy, configure, and maintain yourself. AIStor is a storage layer with hooks, not an AI system.

HeliosDB is different. The AI capabilities are compiled into the same binary as the storage engine. There is no network hop between your data and your ML model. No serialization/deserialization between the vector index and the SQL optimizer. No separate service to crash at 3 AM.

MinIO AIStor: Storage + External AI Services

+------------------+
|   MinIO AIStor   |   (stores objects, fires webhooks)
+--------+---------+
         |
         | promptObject / webhooks / S3 notifications
         |
    +----+-----+     +-----------+     +------------+
    | PyTorch  |     |  Milvus   |     | LangChain  |
    | (train)  |     | (vectors) |     |   (RAG)    |
    +----------+     +-----------+     +------------+
         |                |                  |
    +----+-----+     +----+------+     +-----+------+
    | TensorFlow|    |  LanceDB  |     | External   |
    | (infer)  |     | (backup)  |     |   LLM API  |
    +----------+     +-----------+     +------------+

 6+ services to deploy, monitor, version, and keep alive
 No ACID across the pipeline
 Latency: network hop on every AI operation
 Failure domain: any service down = pipeline broken

HeliosDB: Single Binary, Everything Built In

+-----------------------------------------------------+
|                     HeliosDB                         |
|                                                     |
|  +-------+ +--------+ +--------+ +-------+ +-----+ |
|  |  SQL  | | Vector | |  RAG   | |  ML   | |  NL | |
|  |Engine | | Search | |Pipeline| |Train  | |Query| |
|  +---+---+ +---+----+ +---+----+ +---+---+ +--+--+ |
|      |         |           |          |        |    |
|  +---+---------+-----------+----------+--------+-+  |
|  |          MVCC Storage Engine                  |  |
|  |  (ACID + TDE + WAL + Branching + Tiering)     |  |
|  +-----------------------------------------------+  |
|                                                     |
|  5 Autonomous Agents (tuning, healing, quality,     |
|  optimization, security) -- always running          |
+-----------------------------------------------------+

 1 binary, 1 process, 0 external dependencies
 All operations ACID-compliant
 Latency: function call, not network call
 Failure domain: one thing to monitor

Vector Search: External vs Native

MinIO AIStor: Store Embeddings Elsewhere

AIStor has no vector index. To do similarity search, you store embeddings in Milvus, LanceDB, or Weaviate -- services that use AIStor as their blob backend. The result is two separate query paths with no ACID consistency between them.

# AIStor + Milvus: two systems, two queries, no ACID

# Step 1: Store raw document in AIStor
s3.put_object(Bucket='docs', Key='report.pdf', Body=pdf_bytes)

# Step 2: Generate embedding externally
embedding = openai.embeddings.create(input=text, model='text-embedding-3-small')

# Step 3: Store embedding in Milvus (which itself uses AIStor for persistence)
milvus.insert(collection='documents', data=[{
    'embedding': embedding.data[0].embedding,
    's3_key': 'report.pdf'
}])

# Step 4: Search requires querying Milvus, then fetching from AIStor
results = milvus.search(
    collection='documents',
    data=[query_embedding],
    limit=5,
    output_fields=['s3_key']
)

# Step 5: Fetch actual documents from AIStor
for hit in results[0]:
    obj = s3.get_object(Bucket='docs', Key=hit.entity.get('s3_key'))
    # ... process

# Problems:
#  - Document deleted from AIStor? Milvus still returns it (stale index)
#  - Milvus insert fails? Document exists without embedding (orphan)
#  - No transactional guarantee across the two systems
#  - Network round-trip on every search + fetch

HeliosDB: Vector Search Inside the Database

-- Everything in one transaction, one system

-- Store document with embedding
INSERT INTO documents (title, content, embedding, metadata)
VALUES (
    'Q4 Report',
    'Revenue grew 34% year-over-year...',
    '[0.12, -0.45, 0.89, ...]'::vector(768),
    '{"department": "finance", "quarter": "Q4"}'
);

-- Vector similarity search with SQL filters -- one query
SELECT title, content,
       embedding <-> $1::vector(768) AS distance
FROM documents
WHERE metadata @> '{"department": "finance"}'
ORDER BY embedding <-> $1::vector(768)
LIMIT 10;

-- Hybrid search: combine FTS + vector + metadata
SELECT title, content
FROM documents
WHERE content @@ 'revenue AND growth'
  AND embedding <-> $1::vector(768) < 0.4
  AND metadata @> '{"quarter": "Q4"}'
ORDER BY embedding <-> $1::vector(768);

-- Via Redis protocol (RESP2/RESP3):
-- VSEARCH documents 768 query_vec K 10 FILTER department=finance
-- ACID guarantee: delete cascades to vector index automatically
DELETE FROM documents WHERE title = 'Q4 Report';
-- No orphan embeddings, no stale search results, no cleanup scripts

RAG Pipelines

MinIO AIStor: promptObject -- A Thin Wrapper

AIStor's "AI feature" is promptObject: a webhook-style API that sends an object's content to an external LLM and returns the response. It does not chunk, embed, index, rerank, or cache. It is an HTTP proxy to an LLM endpoint.

# AIStor promptObject: send object to external LLM
import requests

response = requests.post('http://aistor:9000/prompt', json={
    'bucket': 'docs',
    'key': 'report.pdf',
    'prompt': 'Summarize this document',
    'model': 'gpt-4'  # external LLM required
})

# What promptObject does NOT do:
#  - Chunk the document intelligently
#  - Generate or store embeddings
#  - Perform multi-source retrieval
#  - Rerank results by relevance
#  - Cache responses
#  - Support graph-based RAG
#  - Handle multi-turn conversations
#
# For real RAG, you still need:
#  LangChain + Milvus + embedding API + chunking logic + reranking model

HeliosDB: Full RAG Pipeline in SQL

-- 1. Ingest: automatic chunking with 6 strategies
INSERT INTO knowledge_base (source, content, metadata)
VALUES ('report.pdf', pg_read_file('/data/report.pdf'), '{"type": "finance"}');
-- Triggers: auto-chunk (semantic, sliding window, paragraph, sentence,
--           fixed-size, recursive), auto-embed, auto-index

-- 2. Multi-source retrieval with reranking
WITH candidates AS (
    -- Vector search: semantic similarity
    SELECT id, content, embedding <-> $1::vector(768) AS vec_score
    FROM knowledge_base
    WHERE metadata @> '{"type": "finance"}'
    ORDER BY embedding <-> $1::vector(768)
    LIMIT 50
),
fts_boost AS (
    -- Full-text search: keyword relevance
    SELECT c.id, c.content, c.vec_score,
           ts_rank(to_tsvector(c.content), plainto_tsquery($2)) AS text_score
    FROM candidates c
    WHERE to_tsvector(c.content) @@ plainto_tsquery($2)
),
reranked AS (
    -- Reciprocal rank fusion
    SELECT id, content,
           0.6 * (1.0 / (60 + ROW_NUMBER() OVER (ORDER BY vec_score)))
         + 0.4 * (1.0 / (60 + ROW_NUMBER() OVER (ORDER BY text_score DESC)))
           AS rrf_score
    FROM fts_boost
)
SELECT id, content, rrf_score
FROM reranked
ORDER BY rrf_score DESC
LIMIT 5;

-- 3. Graph RAG: traverse document relationships
SELECT d.content
FROM knowledge_graph g
JOIN knowledge_base d ON d.id = g.target_id
WHERE g.source_id = $1
  AND g.relation = 'references'
  AND g.confidence > 0.8;

-- 4. Conversational context: session-aware retrieval
SELECT content FROM rag_retrieve(
    query     := 'What was Q4 revenue?',
    session   := 'user-session-abc',
    strategy  := 'hybrid',
    top_k     := 5,
    rerank    := true
);

Autonomous Intelligence

AIStor expects you to tune, scale, and secure your infrastructure manually. HeliosDB runs five autonomous cognitive agents that continuously optimize the system without human intervention.

Capability MinIO AIStor HeliosDB (Autonomous Agents)
Performance tuningManual: adjust erasure coding, drive layout, networkTuning Agent: auto-adjusts buffer pool, WAL, compaction, parallelism
Self-healingErasure rebuild on drive failure (data only)Healing Agent: detects corruption, restores from WAL, rebalances shards
Data qualityNone -- blobs are opaqueQuality Agent: schema drift detection, null audits, outlier flagging
Query optimizationN/A (no query engine)Optimizer Agent: adaptive re-optimization, auto-indexes, plan caching
Security responseManual: review audit logs, set bucket policiesSecurity Agent: real-time threat scoring, auto-quarantine, compliance audit
-- HeliosDB: query what the agents are doing
SELECT agent_name, last_action, impact, timestamp
FROM system.agent_activity
ORDER BY timestamp DESC
LIMIT 20;

-- Example output:
-- tuning_agent    | Increased buffer_pool to 4GB   | +12% scan throughput | 2026-03-27 02:15:00
-- healing_agent   | Rebuilt shard 3 index (corrupt) | 0 data loss          | 2026-03-27 01:42:00
-- security_agent  | Quarantined 10.0.0.5 (brute)   | Blocked 847 attempts | 2026-03-27 01:30:00
-- quality_agent   | Flagged NULL spike in orders    | 14% NULL rate (was 2%)| 2026-03-27 01:15:00
-- optimizer_agent | Created index on orders(date)   | -340ms avg query     | 2026-03-27 00:45:00

S3 Compatible + More

HeliosDB Full includes an S3-compatible HTTP gateway that speaks the same API as AIStor -- so your existing aws cli, SDKs, and tools work unchanged. But HeliosDB layers intelligence on top.

# Standard AWS CLI works identically
aws --endpoint-url http://localhost:8443 s3 mb s3://data-lake
aws --endpoint-url http://localhost:8443 s3 cp dataset.parquet s3://data-lake/
aws --endpoint-url http://localhost:8443 s3 ls s3://data-lake/

# Threat scan result in response headers (HeliosDB exclusive)
curl -X PUT http://localhost:8443/data-lake/upload.bin --data-binary @upload.bin
# Response: x-helios-threat-status: clean
# Response: x-helios-threat-score: 0.02
-- SQL bucket management (no JSON policy files)
CREATE BUCKET ml_artifacts;
SHOW BUCKETS;

-- Natural language policies (vs AIStor's JSON policy documents)
ALTER BUCKET ml_artifacts SET POLICY 'keep GDPR-relevant data in EU regions only';
ALTER BUCKET ml_artifacts SET POLICY 'deny access outside business hours for PII buckets';

-- Presigned URLs via SQL (no SDK required)
SELECT presign_url('ml_artifacts', 'models/v2.pt', 3600);
-- Returns: https://helios:8443/ml_artifacts/models/v2.pt?X-Amz-Signature=...

-- ML-based lifecycle tiering (vs AIStor's manual rules)
ALTER BUCKET ml_artifacts SET LIFECYCLE AUTO;
-- 62-feature model monitors access patterns and migrates automatically:
-- hot (NVMe) -> warm (SSD) -> cold (HDD) -> archive (S3 Glacier-class)

-- Object Lock / WORM compliance
ALTER BUCKET ml_artifacts SET LOCK COMPLIANCE 365;
ALTER OBJECT ml_artifacts/audit/2026.log SET LEGAL HOLD ON;

-- AI threat scan
SCAN BUCKET ml_artifacts THREATS;
SHOW BUCKET ml_artifacts AI STATUS;

-- CDC events as queryable data (not just webhooks)
SELECT * FROM system.cdc_events
WHERE bucket = 'ml_artifacts'
  AND event_type = 'PUT'
  AND timestamp > NOW() - INTERVAL '1 hour';

When to Choose Each

Choose MinIO AIStor When:

  • Petabyte-scale blob storage is the primary need -- You are storing training datasets, model checkpoints, video archives, or backup tarballs and need raw throughput at minimal cost
  • You already have a mature ML platform -- Your team runs Kubeflow, MLflow, or SageMaker and needs a fast S3-compatible backend for those tools
  • Iceberg table catalog -- You use Apache Iceberg with Spark/Trino/Flink and need a catalog + storage backend
  • Erasure-coded durability -- You want drive-failure tolerance without full replication overhead on large binary objects
  • Your AI code is already written -- Your pipeline is in production with LangChain, Milvus, and PyTorch -- and you just need better storage under it

Choose HeliosDB When:

  • AI capabilities belong inside the database -- You want vector search, RAG, ML training, and anomaly detection without deploying 5+ external services
  • You need to query your data -- Full SQL with JOINs, CTEs, window functions, aggregations, and subqueries across all data types
  • ACID transactions across AI operations -- Document insert + embedding + vector index update in one atomic transaction
  • Autonomous operations -- Self-tuning, self-healing, and self-securing without a dedicated infrastructure team
  • Multi-protocol access -- The same data accessible via PostgreSQL, Redis, MongoDB, REST, gRPC, and 10 other wire protocols
  • Simpler architecture -- One binary instead of AIStor + Milvus + LangChain + PyTorch + LLM API + monitoring stack
  • Cost -- Open source core vs. $96K/year enterprise license
  • Compliance built in -- GDPR, HIPAA, SOC2, PCI-DSS, and FedRAMP controls in the engine, not bolted on via policy documents

Migration Path

Switching from AIStor to HeliosDB's S3 gateway requires changing one URL. Your existing S3 SDKs, CLI tools, and applications work unchanged.

Step 1: Point Your Endpoint

# Before (AIStor)
aws --endpoint-url http://aistor-cluster:9000 s3 ls

# After (HeliosDB)
aws --endpoint-url http://heliosdb:8443 s3 ls

# Environment variable approach
export AWS_ENDPOINT_URL=http://heliosdb:8443
aws s3 ls  # same commands, new backend

Step 2: Migrate Existing Buckets

-- One-command migration from AIStor
MIGRATE BUCKET FROM 'http://aistor-cluster:9000'
    CREDENTIALS ('aistor-access-key', 'aistor-secret-key')
    BUCKET data_lake;

-- Monitor progress
SHOW MIGRATION STATUS;

-- Verify
SHOW BUCKETS;
SELECT COUNT(*) FROM _s3_bucket_data_lake;

Step 3: Enable AI Features (What You Came For)

-- Now that your data is in HeliosDB, turn on intelligence

-- Auto-embed all documents for vector search
ALTER TABLE _s3_bucket_data_lake ENABLE AUTO_EMBED;

-- Enable ML-based tiering (replaces manual lifecycle rules)
ALTER BUCKET data_lake SET LIFECYCLE AUTO;

-- Enable threat scanning on ingest
ALTER BUCKET data_lake SET THREAT_SCAN ON;

-- Create NL policies (replaces JSON policy documents)
ALTER BUCKET data_lake SET POLICY 'deny downloads of PII outside VPN';

-- Query your objects with SQL (impossible in AIStor)
SELECT key, size, last_modified,
       metadata->>'content-type' AS type
FROM _s3_bucket_data_lake
WHERE size > 1048576
  AND last_modified > NOW() - INTERVAL '7 days'
ORDER BY size DESC;

Step 4: Python Application (Zero Code Changes for S3)

import boto3

# Only the endpoint URL changes
s3 = boto3.client('s3',
    endpoint_url='http://heliosdb:8443',  # was: http://aistor:9000
    aws_access_key_id='helios-key',
    aws_secret_access_key='helios-secret'
)

# All existing S3 operations work identically
s3.put_object(Bucket='data-lake', Key='new-file.csv', Body=csv_data)
obj = s3.get_object(Bucket='data-lake', Key='new-file.csv')

# NEW: also query via PostgreSQL protocol for AI features
import psycopg2
conn = psycopg2.connect("host=heliosdb port=5432 dbname=helios")
cur = conn.cursor()
cur.execute("""
    SELECT key, embedding <-> %s::vector(768) AS distance
    FROM _s3_bucket_data_lake
    ORDER BY embedding <-> %s::vector(768)
    LIMIT 5
""", (query_vec, query_vec))
results = cur.fetchall()

Summary

Dimension MinIO AIStor HeliosDB
ArchitectureS3 store + webhooks to external AIUnified engine: SQL + vectors + RAG + ML + agents
AI approachHooks to external services (bolted on)Compiled into the binary (built in)
Vector searchRequires Milvus / LanceDBNative HNSW + IVF with SIMD + PQ
RAGpromptObject (LLM proxy)6 chunking strategies, reranking, graph RAG
ML trainingExternal (PyTorch/TF)Built-in 10+ algorithms
Autonomous opsManual tuning and monitoring5 cognitive agents, always running
Query powerS3 Select (single file)Full SQL optimizer, 69+ plan nodes
TransactionsSingle-objectMulti-statement MVCC (Serializable)
ProtocolsS3 + Iceberg REST14 wire protocols
EncryptionSSE-S3 / KMS / CTDE + ZKE + post-quantum
ComplianceBasic / contract-basedGDPR / HIPAA / SOC2 / PCI-DSS / FedRAMP
Deployment4+ node clusterSingle binary to cluster
Cost~$96K/year enterpriseOpen source + $5K-$20K managed
S3 compatibleYes (native)Yes (gateway) + SQL bucket management
Migration--One endpoint URL change + MIGRATE BUCKET FROM

MinIO AIStor is excellent infrastructure for teams that already have a mature AI/ML platform and need a fast S3 backend. HeliosDB is for teams that want the AI capabilities inside the database -- no external services, no pipeline glue code, no 3 AM pages from Milvus crashing. With the S3-compatible gateway, HeliosDB can serve as a drop-in replacement that adds intelligence on day one, or work alongside AIStor as the query and AI layer over your existing object storage.

Ready to try HeliosDB?

Get started with HeliosDB in minutes. Open source, free to use.

Get Started Contact Sales