Documentation Index
Fetch the complete documentation index at: https://docs.stateset.com/llms.txt
Use this file to discover all available pages before exploring further.
StateSet Computer Use Agent - Architecture Overview
Executive Summary
StateSet Computer Use Agent is a production-grade AI automation platform powered by Claude Opus 4.5. The system deploys multiple specialized AI agents that can see, understand, and interact with desktop environments to complete complex, long-running tasks autonomously. Built with Python using async/await patterns throughout, the platform implements Anthropic’s context engineering research achieving 95% cost savings compared to naive approaches.
Key Metrics:
- Average tokens/task: 7,500 (95% reduction from 150k baseline)
- Average cost/task: 0.11(952.25 baseline)
- Average task duration: 30 seconds (33% faster with parallel execution)
- Parallel speedup: 30-50% on multi-tool tasks
System Architecture Diagram
┌─────────────────────────────────────────────────────────────────────────────┐
│ User Interface │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ CLI/Shell │ │ Dashboard │ │ APIs │ │
│ │ Scripts │ │ (Next.js) │ │ (REST) │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
└──────────────────┼─────────────────┼─────────────────┼───────────────────────┘
│ │ │
┌──────────────────▼─────────────────▼─────────────────▼───────────────────────┐
│ ORCHESTRATION LAYER │
│ ┌────────────────────────────────────────────────────────────────────────┐ │
│ │ main.py │ │
│ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────────┐│ │
│ │ │ Agent Selector │ │ GlobalState │ │ Multi-Agent Runner ││ │
│ │ │ (keyword-based) │ │ (thread-safe) │ │ (asyncio.gather) ││ │
│ │ └─────────────────┘ └─────────────────┘ └─────────────────────────┘│ │
│ └────────────────────────────────────────────────────────────────────────┘ │
└───────────────────────────────────┬─────────────────────────────────────────┘
│
┌───────────────────────────────────▼─────────────────────────────────────────┐
│ AGENT ENGINE │
│ ┌────────────────────────────────────────────────────────────────────────┐ │
│ │ agent/loop.py │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌────────────┐ │ │
│ │ │ Sampling │ │ API │ │ System │ │ Message │ │ │
│ │ │ Loop │ │ Providers │ │ Prompt │ │ Manager │ │ │
│ │ │ │ │ (3 backends) │ │ Init │ │ (cache) │ │ │
│ │ └──────────────┘ └──────────────┘ └──────────────┘ └────────────┘ │ │
│ └────────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌────────────────────┐ ┌────────────────────┐ ┌────────────────────────┐ │
│ │ SubagentManager │ │ MCPManager │ │ StructuredOutput │ │
│ │ (task delegation) │ │ (external tools) │ │ Parser │ │
│ └────────────────────┘ └────────────────────┘ └────────────────────────┘ │
└───────────────────────────────────┬─────────────────────────────────────────┘
│
┌───────────────────────────────────▼─────────────────────────────────────────┐
│ TOOL LAYER │
│ ┌──────────────────────────────────────────────────────────────────────┐ │
│ │ ToolCollection │ │
│ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────────────┐│ │
│ │ │ Computer │ │ Bash │ │ Edit │ │ Memory ││ │
│ │ │ Tool │ │ Tool │ │ Tool │ │ Tool ││ │
│ │ │ (GUI ops) │ │ (commands) │ │ (files) │ │ (persistence) ││ │
│ │ └────────────┘ └────────────┘ └────────────┘ └────────────────────┘│ │
│ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────────────┐│ │
│ │ │ AGI │ │ Subagent │ │ StateSet │ │ AskUser ││ │
│ │ │ Tool │ │ Tool │ │ CLI Tool │ │ Tool ││ │
│ │ └────────────┘ └────────────┘ └────────────┘ └────────────────────┘│ │
│ └──────────────────────────────────────────────────────────────────────┘ │
└───────────────────────────────────┬─────────────────────────────────────────┘
│
┌───────────────────────────────────▼─────────────────────────────────────────┐
│ OPTIMIZATION LAYER │
│ ┌───────────────────┐ ┌───────────────────┐ ┌───────────────────────────┐│
│ │ ParallelExecutor │ │ ContextOptimizer │ │ ToolExecutionGuard ││
│ │ (30-50% speedup) │ │ (5 patterns) │ │ (safety + verification) ││
│ └───────────────────┘ └───────────────────┘ └───────────────────────────┘│
│ ┌───────────────────┐ ┌───────────────────┐ ┌───────────────────────────┐│
│ │ StuckDetection │ │ Verification │ │ Checkpoint ││
│ │ (loop prevention) │ │ (visual confirm) │ │ (state persistence) ││
│ └───────────────────┘ └───────────────────┘ └───────────────────────────┘│
└───────────────────────────────────┬─────────────────────────────────────────┘
│
┌───────────────────────────────────▼─────────────────────────────────────────┐
│ OBSERVABILITY LAYER │
│ ┌──────────────────────────────────────────────────────────────────────┐ │
│ │ UnifiedObservability │ │
│ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────────────┐│ │
│ │ │ Structured │ │OpenTelemetry│ │ Prometheus │ │ Real-time Event ││ │
│ │ │ Logging │ │ Tracing │ │ Metrics │ │ Streaming ││ │
│ │ └────────────┘ └────────────┘ └────────────┘ └────────────────────┘│ │
│ └──────────────────────────────────────────────────────────────────────┘ │
└───────────────────────────────────┬─────────────────────────────────────────┘
│
┌───────────────────────────────────▼─────────────────────────────────────────┐
│ EXTERNAL SERVICES │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────────────────┐ │
│ │ Anthropic │ │ StateSet │ │ Stripe │ │ MCP Servers │ │
│ │ API │ │ APIs │ │ Billing │ │ (Slack, GitHub, etc.) │ │
│ └────────────┘ └────────────┘ └────────────┘ └────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
Core Components
1. Main Orchestrator (main.py)
The entry point for all agent execution, responsible for:
Environment Validation:
def validate_environment(*, require_display: bool = True) -> Dict[str, str]:
"""Validates ANTHROPIC_API_KEY, DISPLAY, STRIPE_API_KEY, WORKSPACE_PATH"""
Agent Selection:
def get_active_agents(instruction: str) -> List[AgentType]:
"""Keyword-based agent selection from instruction text"""
# Matches: "auto-close" → AUTO_CLOSE, "social media" → SOCIAL_MEDIA, etc.
Global State Management:
class GlobalState:
running: bool # System-wide running flag
tasks: Set[asyncio.Task] # Active agent tasks
shutdown_event: Event # Graceful shutdown coordination
_lock: threading.Lock # Thread-safe state management
Multi-Agent Execution:
async def continuous_loop(agents: List[AgentConfig], instruction: str):
"""Spawns agents in parallel using asyncio.gather()"""
tasks = [asyncio.create_task(run_agent(agent, instruction)) for agent in agents]
results = await asyncio.gather(*tasks, return_exceptions=True)
Task Completion Analysis:
async def analyze_task_completion(messages, agent_type) -> TaskStatus:
"""Agent-specific completion detection with indicator patterns"""
# AUTO_CLOSE: "ticket closed", "successfully closed", "task finished"
# SOCIAL_MEDIA: "comment hidden", "content removed", "moderation complete"
2. Agent Loop (agent/loop.py)
The core conversation engine with Claude API:
Sampling Loop:
async def sampling_loop(
model: str, # claude-opus-4-5-20251101
provider: APIProvider, # ANTHROPIC | BEDROCK | VERTEX
system_prompt_suffix: str, # Agent-specific rules
messages: List[BetaMessageParam],
tool_collection: ToolCollection,
# New capabilities
enable_subagents: bool = True,
mcp_servers: Dict = None,
output_schema: Dict = None,
) -> SamplingLoopResult:
API Provider Support:
| Provider | Model ID | Use Case |
|---|
| ANTHROPIC | claude-opus-4-5-20251101 | Direct API access |
| BEDROCK | anthropic.claude-opus-4-5-20251101-v1:0 | AWS infrastructure |
| VERTEX | claude-opus-4-5-20251101 | Google Cloud |
Beta Flags:
prompt-caching-2024-07-31 - 90% cost reduction on cached tokens
advanced-tool-use-2025-11-20 - Tool search (regex/bm25)
effort-2025-11-24 - Effort parameter (low/medium/high)
computer-use-2025-11-24 - Latest tool version with zoom action
System Prompt Initialization:
async def initialize_system_prompt(agent_config: AgentConfig) -> str:
"""Fetches rules/attributes from StateSet APIs:
- /api/rules/get-agent-rules
- /api/attributes/get-agent-attributes
- /api/agents/get-agent
"""
Tool Hierarchy:
BaseAnthropicTool (Abstract)
├── ComputerTool (3 versions)
│ ├── Actions: screenshot, click, type, scroll, drag, zoom
│ ├── Resolution scaling (XGA, WXGA, FWXGA)
│ └── Performance: 8ms typing delay, 100-char groups
├── BashTool
│ ├── Persistent session with sentinel pattern
│ ├── Async subprocess management
│ └── 60-second timeout
├── EditTool
│ ├── File creation/modification
│ └── Directory traversal prevention
├── MemoryTool
│ ├── Commands: view, create, str_replace, insert, delete, rename
│ ├── Prompt injection sanitization
│ └── Per-agent memory isolation
├── AGITool
│ └── Extended AGI capabilities
├── SubagentTool (lazy-loaded)
│ └── Spawn specialized subagents
└── AskUserTool
└── Human-in-the-loop requests
Tool Versions:
| Version | Release | Features |
|---|
| computer_use_20251124 | Current | Zoom action, deferred tool loading |
| computer_use_20250124 | Previous | Stable production version |
| computer_use_20241022 | Legacy | Backward compatibility |
ToolCollection API:
class ToolCollection:
tool_map: Dict[str, BaseAnthropicTool] # name → tool
def to_params(self) -> List[Dict] # Convert to API format
async def run(self, name, input) -> ToolResult
def set_deferred_tools(self, tools: List[str]) # For tool search
Advanced Capabilities
4. Subagent System (agent/subagent.py)
Implements Anthropic’s sub-agent compression pattern for 95% context savings:
Subagent Types:
| Type | Model | Max Tokens | Use Case |
|---|
| EXPLORE | Haiku | 4096 | Fast codebase exploration |
| ANALYZE | Sonnet | 8192 | Deep analysis with thinking |
| EXECUTE | Sonnet | 4096 | Task execution with verification |
| RESEARCH | Haiku | 4096 | Web search and synthesis |
| CODE | Sonnet | 8192 | Code generation/modification |
Architecture:
MainAgent (Opus 4.5)
│
├── spawn_subagent("explore", "Find auth files")
│ └── Returns: 2k summary (not 50k raw output)
│
├── spawn_subagent("analyze", "Review patterns")
│ └── Returns: Structured insights
│
└── spawn_subagent("execute", "Refactor code")
└── Returns: Confirmation + diff
Usage:
from agent.subagent import SubagentManager, SubagentType
manager = SubagentManager(api_key)
result = await manager.spawn(
task="Analyze the authentication flow",
subagent_type=SubagentType.ANALYZE,
)
# result.result contains compressed summary
5. MCP Client Integration (agent/mcp_client.py)
Connect to external Model Context Protocol servers:
Supported Transports:
- STDIO (subprocess)
- SSE (Server-Sent Events)
- HTTP (direct HTTP)
Preset Servers:
PRESET_SERVERS = {
"slack": {"command": "npx", "args": ["-y", "@modelcontextprotocol/server-slack"]},
"github": {"command": "npx", "args": ["-y", "@modelcontextprotocol/server-github"]},
"postgres": {"command": "npx", "args": ["-y", "@modelcontextprotocol/server-postgres"]},
"filesystem": {...},
"memory": {...},
"brave-search": {...},
"puppeteer": {...},
"sqlite": {...},
}
Usage in sampling_loop:
result = await sampling_loop(
mcp_servers={
"slack": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-slack"],
"env": {"SLACK_BOT_TOKEN": os.environ["SLACK_BOT_TOKEN"]}
}
},
# Agent now has access to mcp__slack__send_message, etc.
)
6. Structured Output (agent/structured_output.py)
Force Claude to return valid JSON matching specified schemas:
Pre-defined Schemas:
TICKET_ANALYSIS_SCHEMA - Support ticket analysis
TASK_RESULT_SCHEMA - Task completion results
CODE_ANALYSIS_SCHEMA - Code review findings
ENTITY_EXTRACTION_SCHEMA - Entity extraction
Usage:
from agent.structured_output import OutputSchema, StructuredOutputParser
schema = OutputSchema(
name="TicketAnalysis",
schema={
"type": "object",
"properties": {
"tickets_to_close": {"type": "array", "items": {"type": "string"}},
"summary": {"type": "string"},
"confidence": {"type": "number", "minimum": 0, "maximum": 1}
},
"required": ["tickets_to_close", "summary"]
}
)
result = await sampling_loop(output_schema=schema.schema, ...)
parser = StructuredOutputParser(schema)
data = parser.parse(response_text) # Validates against schema
Optimization Systems
7. Parallel Executor (agent/parallel_executor.py)
Automatic parallel execution for independent tool calls:
Dependency Analysis:
class DependencyAnalyzer:
def analyze(self, tool_calls: List[ToolCall]) -> ExecutionPlan:
"""
Rules:
- Computer tool calls: Always sequential (visual state dependency)
- Same path parameter: Sequential (file system dependency)
- Read-only tools: Can parallelize
- Write operations: Sequential
"""
Execution Strategy:
Tool Calls: [screenshot, bash(ls), bash(pwd), click]
↓
Dependency Analysis:
- screenshot → click (computer tool dependency)
- bash(ls), bash(pwd) (independent, read-only)
↓
Execution Plan:
1. [screenshot] # Sequential
2. [bash(ls), bash(pwd)] # Parallel
3. [click] # Sequential
↓
Result: 30-50% speedup
8. Context Optimizer (agent/context_optimizer.py)
Implements 5 Anthropic context engineering patterns:
Pattern 1: Just-in-Time Retrieval
# Instead of: read_file("large_file.py")
# Use: grep("pattern", "large_file.py") | head -50
Pattern 2: Dynamic Compaction
class ContextBudget:
OPTIMAL = 50_000 # EXCELLENT attention quality
ATTENTION_DEGRADATION = 100_000 # GOOD → DEGRADED
WARNING = 150_000 # DEGRADED → WARNING
CRITICAL = 200_000 # WARNING → CRITICAL
Pattern 3: Structured Note-Taking
# Persistent memory outside context window
memory_tool.create("auth_findings", "OAuth2 flow uses refresh tokens...")
Pattern 4: Sub-Agent Compression
# 50k raw exploration → 2k structured summary
subagent = await manager.spawn(task="Find all API endpoints", type=EXPLORE)
Pattern 5: Attention Budget Monitoring
class AttentionQuality(Enum):
EXCELLENT = "excellent" # < 50k tokens
GOOD = "good" # < 100k tokens
DEGRADED = "degraded" # < 150k tokens
WARNING = "warning" # < 200k tokens
CRITICAL = "critical" # > 200k tokens
Safety and verification layer:
Features:
- Pre-execution Validation: Safety checks before tool execution
- Visual Verification: Confirms actions took effect (optional)
- Stuck Detection: Monitors for infinite loops
- Result Caching: 120-second TTL for cacheable operations
Speed Modes:
# Normal mode: Verification enabled (~0.5-1.0s per action)
python main.py "task"
# Fast mode: Skip verification (2-3x faster)
AGENT_FAST_MODE=1 python main.py "task"
10. Stuck Detection (agent/stuck_detection.py)
Prevents infinite loops and stuck patterns:
Detection Methods:
- Repeating same action consecutively
- Cycling between 2-3 actions
- No visual progress (identical screenshots)
- Slow progress (too few actions per time)
Recovery Strategies:
class StuckDetector:
def check(self, action: ActionRecord) -> Optional[RecoverySuggestion]:
"""
Returns suggestions like:
- "Try a different approach"
- "Scroll to see more content"
- "Check if element exists"
"""
Observability System
11. Unified Observability (agent/observability/)
Single interface for all observability concerns:
Configuration:
from agent.observability import get_observability, configure_observability
configure_observability(
enable_metrics=True,
enable_tracing=True,
enable_streaming=True,
metrics_port=9090,
otlp_endpoint="localhost:4317", # OpenTelemetry
)
Usage:
obs = get_observability()
async with obs.task_context("AUTO_CLOSE", "agent-123", "Close tickets"):
obs.log_info("Starting task", tickets_count=10)
with obs.tool_execution("computer", action="click"):
# Automatically tracked
pass
obs.record_api_call(
provider="anthropic",
model="claude-opus-4-5-20251101",
latency=2.5,
input_tokens=1500,
output_tokens=500,
)
Components:
| Component | Purpose | Backend |
|---|
| Structured Logging | JSON logs with context | Python logging |
| Distributed Tracing | Request correlation | OpenTelemetry |
| Metrics | Performance tracking | Prometheus |
| Event Streaming | Real-time updates | SSE/WebSocket |
| Health Monitoring | System health | Circuit breakers |
Environment Variables:
METRICS_PORT=9090 # Prometheus metrics
OTLP_ENDPOINT=localhost:4317 # OpenTelemetry collector
LOG_FORMAT=json # json | human | compact
LOG_LEVEL=INFO # DEBUG | INFO | WARNING | ERROR
Infrastructure
12. Configuration Management (agent/config.py)
Centralized configuration with documented rationale:
Configuration Classes:
@dataclass
class ContextSettings:
optimal_budget: int = 50_000 # From Anthropic research
degradation_threshold: int = 100_000 # Attention starts degrading
warning_threshold: int = 150_000 # Significant degradation
max_context: int = 200_000 # Model limit
@dataclass
class ToolSettings:
bash_timeout: int = 60 # Optimized from 120s
typing_delay_ms: int = 8 # Characters per ms
screenshot_retention: int = 5 # Most recent screenshots
@dataclass
class BudgetSettings:
input_price_per_million: float = 3.0 # Claude Opus 4.5
output_price_per_million: float = 15.0
cached_input_price: float = 0.30 # 90% savings
13. Exception Hierarchy (agent/exceptions.py)
Comprehensive error handling:
AgentError (base)
├── RetryableError
│ ├── NetworkError
│ ├── RateLimitError
│ ├── TimeoutError
│ └── ServiceUnavailableError
├── NonRetryableError
│ ├── ConfigurationError
│ ├── ValidationError
│ ├── SecurityError
│ └── AuthenticationError
├── BudgetError
│ ├── DailyBudgetExceededError
│ └── TaskBudgetExceededError
└── ToolError
├── ToolExecutionError
├── ToolTimeoutError
└── ToolValidationError
14. Health Monitoring (agent/health.py)
Production health checks:
class HealthChecker:
async def check_anthropic_api(test_connectivity=True) -> HealthCheck
async def check_system_resources() -> HealthCheck
async def check_disk_space() -> HealthCheck
# Circuit breaker for failing services
circuit_breaker: CircuitBreaker
Health States:
HEALTHY - All checks passing
DEGRADED - Some checks failing, system operational
UNHEALTHY - Critical failures
Dashboard Architecture
15. Backend (dashboard/backend/)
FastAPI REST API with async operations:
dashboard/backend/
├── app/
│ ├── main.py # FastAPI app factory
│ ├── api/ # REST API routes
│ │ ├── jobs.py # Job CRUD
│ │ ├── templates.py # Workflow templates
│ │ ├── artifacts.py # Screenshot/output storage
│ │ └── metrics.py # Performance tracking
│ ├── models/ # SQLAlchemy ORM models
│ ├── schemas/ # Pydantic schemas
│ ├── services/ # Business logic
│ ├── tasks/ # Celery workers
│ │ └── worker.py # Async agent execution
│ └── core/ # Configuration, database
└── migrations/ # Alembic schema versioning
Key Technologies:
- FastAPI with CORS
- SQLAlchemy async ORM
- PostgreSQL database
- Celery task queue
- Server-Sent Events (SSE)
- S3-compatible artifact storage (boto3)
16. Frontend (dashboard/frontend/)
Next.js 14 application:
dashboard/frontend/
├── app/ # App router pages
├── components/ # React components
├── hooks/ # Custom React hooks
└── lib/ # Utilities
Key Technologies:
- Next.js 14 with app router
- React Query for data fetching
- Tailwind CSS styling
- EventSource for real-time updates
Execution Flow
Complete Request Flow
1. User Command
│
▼
2. validate_environment()
├── Check ANTHROPIC_API_KEY
├── Check DISPLAY
└── Validate optional keys
│
▼
3. get_active_agents(instruction)
├── Parse keywords: "auto-close" → AUTO_CLOSE
└── Return: List[AgentConfig]
│
▼
4. continuous_loop(agents, instruction)
│
├──────────────────────────────────────┐
│ │
▼ ▼
5a. run_agent(AUTO_CLOSE) 5b. run_agent(SOCIAL_MEDIA)
│ │
▼ ▼
6. initialize_system_prompt() 6. initialize_system_prompt()
├── Fetch rules from StateSet (parallel)
└── Build system prompt
│
▼
7. sampling_loop()
│
├─── Send to Claude API ──────────────────────────┐
│ │ │
│ ▼ │
│ Claude Response │
│ ├── Text content │
│ └── Tool calls │
│ │ │
│ ▼ │
├─── ToolExecutionGuard.execute() │
│ ├── DependencyAnalyzer │
│ ├── ParallelToolExecutor │
│ ├── StuckDetection │
│ └── Verification (optional) │
│ │ │
│ ▼ │
│ Tool Results │
│ │ │
└─────────┴───────────────────────────────────────┘
│
▼ (loop until done)
│
▼
8. analyze_task_completion()
├── Check completion indicators
└── Return TaskStatus
│
▼
9. send_stripe_meter_event()
├── Token usage
└── Cost calculation
│
▼
10. shutdown_gracefully()
├── Cancel all tasks
└── Cleanup resources
Agent Types
Supported Agents
| Agent Type | Keywords | Purpose |
|---|
| AUTO_CLOSE | ”auto-close”, “ticket” | Close resolved support tickets |
| SOCIAL_MEDIA | ”social media”, “moderate” | Content moderation |
| LINKEDIN_MESSENGER | ”linkedin”, “outreach” | LinkedIn automation |
| SLACK_SUPPORT | ”slack”, “support” | Slack support automation |
| SHOPIFY | ”shopify”, “e-commerce” | E-commerce management |
| ONBOARDING | ”onboarding”, “setup” | User onboarding |
| STATESET_AGENTIC | ”stateset”, “custom” | Custom tasks |
Agent Configuration
@dataclass
class AgentConfig:
org_id: str # Organization identifier
agent_id: str # Unique agent identifier
description: str # Agent purpose
capabilities: List[str] # What the agent can do
stripe_customer_id: str # Billing identifier
Security Architecture
API Key Management
- All keys via environment variables
- Validation on startup
- No key transmission to external services
- Directory traversal prevention in EditTool
- Prompt injection protection in MemoryTool
- Pre-execution validation via ToolExecutionGuard
- Agent memory isolation (per agent_id)
Sandbox Execution
- Tools run in controlled environment
- File system access limited by permissions
- Network access controlled by system
Benchmarks
| Metric | Value | Notes |
|---|
| Tokens/task | 7,500 | 95% reduction from 150k |
| Cost/task | $0.11 | 95% savings from $2.25 |
| Task duration | 30s | 33% faster with parallel |
| Parallel speedup | 30-50% | On multi-tool tasks |
| Typing speed | 8ms/char | Optimized from 50ms |
| Bash timeout | 60s | Optimized from 120s |
Cost Breakdown
| Operation | Price |
|---|
| Input tokens | $3.00/1M |
| Output tokens | $15.00/1M |
| Cached input | $0.30/1M (90% savings) |
File Organization
stateset-computer-use-agent/
├── main.py # Entry point, orchestration
├── agent/
│ ├── loop.py # Core sampling loop
│ ├── parallel_executor.py # Parallel tool execution
│ ├── context_optimizer.py # Context engineering
│ ├── tool_guard.py # Safety checks
│ ├── stuck_detection.py # Loop prevention
│ ├── verification.py # Visual verification
│ ├── subagent.py # Subagent spawning
│ ├── mcp_client.py # MCP integration
│ ├── structured_output.py # JSON schema validation
│ ├── checkpoint.py # State persistence
│ ├── metrics.py # Performance tracking
│ ├── skill_manager.py # Skill system
│ ├── config.py # Configuration
│ ├── exceptions.py # Error hierarchy
│ ├── logging_config.py # Structured logging
│ ├── health.py # Health monitoring
│ ├── observability/ # Unified observability
│ │ ├── unified.py
│ │ ├── tracing.py
│ │ └── metrics.py
│ └── tools/ # Tool implementations
│ ├── base.py
│ ├── collection.py
│ ├── computer.py
│ ├── bash.py
│ ├── edit.py
│ ├── memory.py
│ ├── agi.py
│ └── groups.py
├── dashboard/
│ ├── backend/ # FastAPI + Celery
│ └── frontend/ # Next.js 14
├── start-*.sh # Launch scripts
└── test_*.py # Test suites
Extension Points
Adding New Agents
- Define AgentConfig in
AGENT_CONFIGS
- Add keyword detection in
get_active_agents()
- Create completion indicators in
analyze_task_completion()
- Inherit from
BaseAnthropicTool
- Implement
__call__ returning ToolResult
- Add to version groups in
agent/tools/groups.py
- Update tool traits if cacheable/read-only
Adding MCP Servers
await mcp_manager.add_server("custom-server", {
"command": "npx",
"args": ["-y", "@my/custom-mcp-server"],
"env": {"API_KEY": "..."}
})
Quick Reference
Environment Variables
# Required
ANTHROPIC_API_KEY=sk-ant-api03-...
DISPLAY=:1
# Optional
STRIPE_API_KEY=sk_live_...
WORKSPACE_PATH=/path/to/workspace
AGENT_FAST_MODE=1 # Skip verification
METRICS_PORT=9090 # Prometheus
OTLP_ENDPOINT=localhost:4317 # OpenTelemetry
LOG_FORMAT=json # json | human | compact
LOG_LEVEL=INFO # DEBUG | INFO | WARNING | ERROR
Common Commands
# Run agents
python main.py "auto-close tickets"
python main.py "auto-close and social media" # Parallel
# With options
python main.py --effort medium "task"
python main.py --tool-search regex --defer-tool agi_agent "task"
# Dashboard
cd dashboard && docker compose up -d
This architecture provides a scalable, maintainable foundation for computer use automation with AI agents, implementing production-grade patterns for reliability, observability, and cost optimization.