Docs / Architecture

Architecture

AI OS is built around three pillars: the Orchestrator, the Agent Fleet, and the Memory Vault. Every request flows through a multi-model routing layer that selects the optimal AI provider for each task.

System Topology

The platform is organized into three vertical layers:

  • Public Layer — Landing page, legal pages, documentation. Accessible without authentication.
  • Authenticated Layer — Dashboard with 16+ views, gated behind Stripe subscription and session cookie.
  • Intelligence Layer — Orchestrator, agent fleet, multi-model routing, and memory vault. Accessed exclusively through the backend API.

Request Flow

When a user initiates a task from the dashboard:

  1. The browser sends an HTTP request to the Express backend
  2. Middleware validates the session cookie and checks rate limits
  3. The Orchestrator (Opus 4.8 xhigh) analyzes the request and determines which agent(s) to invoke
  4. The effort router selects the optimal effort level (xhigh/high/low) based on task complexity and cost profile
  5. The agent executes, optionally reading from or writing to the Memory Vault
  6. Results stream back via WebSocket for real-time dashboard updates

Effort-Based Model Routing

AI OS consolidates Anthropic models into a single Opus 4.8 model with effort-based routing, plus external models for specialized roles:

Tier Model Effort Use Case Cost Profile
Strategic Opus 4.8 xhigh Orchestration, architecture, code review, security audits $10/$50 per 1M tokens
Professional Opus 4.8 high Research, coding, writing, design, media, lead gen $5/$25 per 1M tokens
Scout Opus 4.8 low Quick lookups, social intel, routine running $5/$25 per 1M tokens
Economy DeepSeek V4 Batch processing, bulk content generation $0.14/$0.28 per 1M tokens
Creative Gemini Omni Video, image, audio, thumbnails, social clips $1.25/$5 per 1M tokens
Realtime Grok-3 Live data, current events, real-time queries $3/$15 per 1M tokens
Reasoning OpenAI GPT-4o / o3 Alternative reasoning, code generation, Codex tasks $2.50-$10/$10-$40 per 1M tokens
Grounded Search Perplexity Sonar Web search with source citations, research queries $1-$3/$1-$15 per 1M + request fees
Autonomous Manus Multi-step agent tasks, autonomous research and web interaction Credit-based ($20-$200/mo)
Persistent Hermes MCP Background tasks, walkaway mode, cron jobs, approval gates Varies by delegated model

Opus 4.8 Features

  • 1M token context window — no beta header required, available by default
  • Adaptive thinking — model decides when to reason deeply (no manual budget_tokens)
  • Dynamic Workflows — plans large tasks, fans out to parallel subagents, resumes on interruption
  • Effort controls — xhigh for maximum reasoning, high for balanced, low for fast/cheap
  • Mid-conversation system messages — inject context updates between turns
  • Improved compaction — longer autonomous sessions with better recovery after context compression
  • Better tool triggering — fewer skipped tool calls compared to 4.7

The Orchestrator

The Orchestrator is the central intelligence that decides how to fulfill every request. Running on Opus 4.8 at xhigh effort (maximum reasoning depth), it:

  • Decomposes complex requests into sub-tasks
  • Selects which agents to invoke and in what order
  • Manages inter-agent dependencies and data handoffs
  • Escalates decisions when confidence is low
  • Synthesizes final outputs from multiple agent results

Memory Vault

The .magent/ directory serves as the persistent memory layer:

  • vault/raw/ — Ingested source documents and data
  • vault/wiki/ — Processed knowledge articles
  • vault/outputs/ — Generated artifacts and deliverables
  • state/ — Runtime state with JSON persistence (auto-saved with debounce)
  • team.yaml — Agent roster, roles, and escalation paths

Backend Stack

Component Technology
Runtime Node.js 20 + Express
Real-time WebSocket (ws) with heartbeat
Security Helmet CSP, CORS, express-rate-limit
Auth Session cookies + Bearer token API auth
Payments Stripe Checkout + Webhooks
State File-based JSON with debounced auto-save
Agent Definitions Markdown + YAML frontmatter
Process Manager PM2 with memory limits and log rotation

Frontend Stack

The dashboard is built with vanilla HTML, CSS, and JavaScript — no build tools, no framework dependencies:

  • WebSocket — Auto-reconnect with exponential backoff (max 30s), visual reconnect banner
  • CSS Variables — Centralized design tokens for consistent theming
  • View Router — Hash-based navigation across 16+ dashboard views
  • Inter + JetBrains Mono — Google Fonts for UI and code typography