A deep-dive into DataOcean's architecture — 605+ tables across 48 domains, six foundational design patterns, 4-layer authorization, enterprise compliance, and real-time observability.
Eight core domains examined in detail — their purpose, key tables, concepts, and enterprise use cases.
A flexible, schema-on-write data collection layer. Atlas Forms lets you define arbitrary forms at runtime — without schema migrations. Stores structured form definitions, field configurations, and submitted data records in a normalised model.
The largest domain. A complete process orchestration substrate — from visual workflow definition to runtime execution tracing. Supports 60+ node types covering decisions, AI agents, human tasks, API calls, timers, loops, and parallel branches.
Enterprise identity and access management. Covers user accounts, multi-organization membership, RBAC roles and permissions, SSO federation, API key management, session tracking, and MFA — all multi-tenant from the ground up.
The compliance backbone. A small but critical domain that provides the security policy engine, capability policy enforcement, and immutable audit log infrastructure that underpins SOX, HIPAA, and GDPR compliance across all other domains.
Everything needed to run a commercial or internal marketplace — package publishing, versioning, review and rating systems, download tracking, publisher verification, and licence management. Supports SaaS app stores, agent marketplaces, and enterprise app distribution.
A reward, reputation, and community engagement engine. Supports gamification (points, badges, levels), community reputation scoring, reward grant management, and rule-based reward automation — suitable for developer communities, customer portals, and partner programs.
A complete billing and payments substrate. Supports subscription billing, one-time transactions, invoice generation, multiple payment gateway integrations, payment method management, and refund processing — with full audit trail on every financial event.
Four sub-domains covering the full AI automation stack: Agent definitions and teams, Channel routing and message queues, Conversation history and states, and the Extended Systems layer (AIFunction, AILLM, AIMCP, AIMonitor, AIScheduler). Together they form a production-grade AI backend.
Every table in DataOcean follows these patterns — applied consistently across all 605 tables, all 48 domains, from day one.
Every table carries Deleted BIT NOT NULL DEFAULT 0 and Archived BIT NOT NULL DEFAULT 0. Hard deletes are never issued. Filtered indexes on Deleted=0 keep active-record queries at full speed.
Why it matters: Data deleted by mistake can be recovered. Compliance regulations require data retention. Referential integrity is never broken by a missing row.
All 605 tables carry CreatedOn, CreatedBy, LastModifiedOn, and LastModifiedBy. 100+ triggers record every INSERT, UPDATE, and state change into the AuditLog table automatically.
Why it matters: SOX, HIPAA, and GDPR all require demonstrable data lineage. Audit trail is the schema — not a layer bolted on later.
TenantID UNIQUEIDENTIFIER NOT NULL appears on every table. All stored procedures, views, and indexes are designed with TenantID as the leading key. Row-level security policies enforce tenant isolation at the engine level.
Why it matters: Retrofitting TenantID onto a production schema is a catastrophic migration. DataOcean ships with it already on every table.
Tables with variable configuration use ConfigJSON NVARCHAR(MAX) columns with JSON Schema validation via check constraints. Avoids EAV anti-patterns while still supporting flexible, per-record configuration without schema changes.
Why it matters: Extensibility without endless schema migrations. Configuration evolves at runtime; the schema stays stable.
SourceAppID, ClientAccountID, AppDomainID, DataDomainID, and DataSegmentID provide five independent segmentation axes on every table. Zero-code data partitioning for multi-product and multi-client deployments.
Why it matters: B2B2C architectures, white-label SaaS, and multi-product platforms all need clean data partitioning — without custom schemas for each product.
ResID UNIQUEIDENTIFIER NOT NULL DEFAULT NEWID() provides a stable, globally unique identifier on every row. Surrogate integer PKs are for local joins; ResID is for external references, replication, API responses, and distributed system federation.
Why it matters: Systems that need to expose stable external IDs — APIs, webhooks, cross-system sync — don't leak internal surrogate keys.
DataOcean sits at the bottom of a clean, layered stack. Each layer has a single responsibility — no logic leaks between them.
Enterprise regulatory requirements are architectural decisions — not afterthoughts. DataOcean's schema was designed to satisfy them from the first table.
Immutable audit trail on all financial tables. CreatedBy and LastModifiedBy fields are set by the database layer and cannot be bypassed by application code. Full change history for every transaction, approval, and override.
Access audit logging via triggers on all protected data tables. Row-level tenant isolation prevents cross-tenant data exposure. Soft delete ensures data retention for the required periods. Encryption at rest supported by SQL Server TDE.
DataSegmentID and TenantID enable geographic data partitioning. Soft delete supports right-to-be-forgotten workflows. Data processing records are maintained in the Compliance domain. JSON configuration columns support consent tracking.
SQL Server Row-Level Security policies enforce TenantID isolation at the storage engine layer — even direct database connections cannot read cross-tenant data. Application-layer security is a second defence, not the first.
Encryption at rest via SQL Server Transparent Data Encryption (TDE). Encryption in transit via enforced TLS for all connections. Sensitive column encryption available via SQL Server Always Encrypted for PII and credential fields.
RBAC privilege system, ABAC scope hierarchy, node-level capability policies, and actor assignment gates. Four independently configurable authorization layers that compose to cover any enterprise access control requirement without custom code.
15 dedicated observability tables give you real-time visibility into every execution, error, and performance metric — without bolting on an external APM tool.
PerformanceMetric table captures query times, execution durations, and resource utilisation with millisecond precision. Aggregation views provide dashboard-ready summaries.
ErrorTracking table stores structured error events with stack traces, severity levels, and correlation IDs. Linked to ProcessExecutionTrace for full root-cause context.
Indexed views aggregate performance data by time window, tenant, and operation type. Pre-built stored procedures return dashboard-ready datasets without custom reporting queries.
ExecutionLog captures every process and workflow run with start time, end time, status, and result payload. Supports full replay debugging and SLA measurement across all workflow types.
Time-series data partitioned by date for efficient range queries. Rolling aggregates maintained via indexed views. Trend analysis covers throughput, error rates, and latency over configurable time windows.
AlertRule table defines threshold-based and anomaly-based alert conditions. AlertEvent table records every triggered alert with context. Integrates with AI Channel domain for multi-channel alert delivery.
Talk to the team about licensing, deployment options, and how DataOcean fits into your existing architecture.