Architecture
AI OS is built around three pillars: the Orchestrator, the Agent Fleet, and the Memory Vault. Every request flows through a multi-model routing layer that selects the optimal AI provider for each task.
System Topology
The platform is organized into three vertical layers:
- Public Layer — Landing page, legal pages, documentation. Accessible without authentication.
- Authenticated Layer — Dashboard with 16+ views, gated behind Stripe subscription and session cookie.
- Intelligence Layer — Orchestrator, agent fleet, multi-model routing, and memory vault. Accessed exclusively through the backend API.
Request Flow
When a user initiates a task from the dashboard:
- The browser sends an HTTP request to the Express backend
- Middleware validates the session cookie and checks rate limits
- The Orchestrator (Opus 4.8 xhigh) analyzes the request and determines which agent(s) to invoke
- The effort router selects the optimal effort level (xhigh/high/low) based on task complexity and cost profile
- The agent executes, optionally reading from or writing to the Memory Vault
- Results stream back via WebSocket for real-time dashboard updates
Effort-Based Model Routing
AI OS consolidates Anthropic models into a single Opus 4.8 model with effort-based routing, plus external models for specialized roles:
| Tier | Model | Effort | Use Case | Cost Profile |
|---|---|---|---|---|
| Strategic | Opus 4.8 | xhigh | Orchestration, architecture, code review, security audits | $10/$50 per 1M tokens |
| Professional | Opus 4.8 | high | Research, coding, writing, design, media, lead gen | $5/$25 per 1M tokens |
| Scout | Opus 4.8 | low | Quick lookups, social intel, routine running | $5/$25 per 1M tokens |
| Economy | DeepSeek V4 | — | Batch processing, bulk content generation | $0.14/$0.28 per 1M tokens |
| Creative | Gemini Omni | — | Video, image, audio, thumbnails, social clips | $1.25/$5 per 1M tokens |
| Realtime | Grok-3 | — | Live data, current events, real-time queries | $3/$15 per 1M tokens |
| Reasoning | OpenAI GPT-4o / o3 | — | Alternative reasoning, code generation, Codex tasks | $2.50-$10/$10-$40 per 1M tokens |
| Grounded Search | Perplexity Sonar | — | Web search with source citations, research queries | $1-$3/$1-$15 per 1M + request fees |
| Autonomous | Manus | — | Multi-step agent tasks, autonomous research and web interaction | Credit-based ($20-$200/mo) |
| Persistent | Hermes MCP | — | Background tasks, walkaway mode, cron jobs, approval gates | Varies by delegated model |
Opus 4.8 Features
- 1M token context window — no beta header required, available by default
- Adaptive thinking — model decides when to reason deeply (no manual budget_tokens)
- Dynamic Workflows — plans large tasks, fans out to parallel subagents, resumes on interruption
- Effort controls — xhigh for maximum reasoning, high for balanced, low for fast/cheap
- Mid-conversation system messages — inject context updates between turns
- Improved compaction — longer autonomous sessions with better recovery after context compression
- Better tool triggering — fewer skipped tool calls compared to 4.7
The Orchestrator
The Orchestrator is the central intelligence that decides how to fulfill every request. Running on Opus 4.8 at xhigh effort (maximum reasoning depth), it:
- Decomposes complex requests into sub-tasks
- Selects which agents to invoke and in what order
- Manages inter-agent dependencies and data handoffs
- Escalates decisions when confidence is low
- Synthesizes final outputs from multiple agent results
Memory Vault
The .magent/ directory serves as the persistent memory layer:
- vault/raw/ — Ingested source documents and data
- vault/wiki/ — Processed knowledge articles
- vault/outputs/ — Generated artifacts and deliverables
- state/ — Runtime state with JSON persistence (auto-saved with debounce)
- team.yaml — Agent roster, roles, and escalation paths
Backend Stack
| Component | Technology |
|---|---|
| Runtime | Node.js 20 + Express |
| Real-time | WebSocket (ws) with heartbeat |
| Security | Helmet CSP, CORS, express-rate-limit |
| Auth | Session cookies + Bearer token API auth |
| Payments | Stripe Checkout + Webhooks |
| State | File-based JSON with debounced auto-save |
| Agent Definitions | Markdown + YAML frontmatter |
| Process Manager | PM2 with memory limits and log rotation |
Frontend Stack
The dashboard is built with vanilla HTML, CSS, and JavaScript — no build tools, no framework dependencies:
- WebSocket — Auto-reconnect with exponential backoff (max 30s), visual reconnect banner
- CSS Variables — Centralized design tokens for consistent theming
- View Router — Hash-based navigation across 16+ dashboard views
- Inter + JetBrains Mono — Google Fonts for UI and code typography