CUGA Enterprise Agent on Katonic
An open-source generalist agent framework from IBM Research, purpose-built for enterprise automation. CUGA combines ReAct, CodeAct, and Planner-Executor patterns into a modular architecture enabling trustworthy, policy-aware, and composable automation across web interfaces, APIs, and enterprise systems.
The Computer Using Generalist Agent
CUGA is IBM Research's open-source AI agent designed for complex enterprise automation-from multi-step workflows to code execution and API orchestration. Ranked #1 on WebArena and AppWorld benchmarks.
Stop building agents from scratch. Start with a generalist.
Building domain-specific enterprise agents is complex: orchestration, planning logic, safety policies, evaluation, and continuous improvement. CUGA abstracts this complexity with a Planner-Executor architecture built on LangGraph-enabling cyclic graphs for retry loops and dynamic re-planning.
See CUGA in Action
Watch how CUGA automates complex enterprise workflows
get top account by revenue from digital sales, then add it to current page
Hybrid task execution on web and API
Watch CUGA pause for human approval during critical decision points
Example Task: get best accounts
Benchmark Results
Ranked #1 on both WebArena and AppWorld leaderboards - beating OpenAI Operator, Anthropic, and Google.
WebArena Leaderboard
Why this matters: CUGA outperforms OpenAI's Operator by 3.6 percentage points while being fully open source. For regulated industries that need to inspect agent decisions, this is the only production-ready option.
Enterprise Pilot Results
Real metrics from IBM's BPO Talent Acquisition pilot deployment - not synthetic benchmarks.
"CUGA saved 20-30 minutes of manual dashboard comparisons per query. It freed time for actual decision-making."— BPO Talent Acquisition Architects, IBM Consulting
Research Papers
Explore the research behind CUGA's architecture and enterprise deployment
Towards Enterprise-Ready Computer Using Generalist Agent
Our evolutionary approach to building enterprise-ready agentic systems, achieving state-of-the-art performance on WebArena and AppWorld through systematic evaluation, analysis, and refinement.
Read PaperFrom Benchmarks to Business Impact: Deploying IBM Generalist Agent in Enterprise Production
Evidence from deploying CUGA in enterprise production, including architectural modifications for auditability, safety, and governance.
Read PaperST-WEBAGENTBENCH: A Benchmark for Evaluating Safety and Trustworthiness in Web Agents
A configurable benchmark suite with 222 tasks for evaluating web agent safety and trustworthiness across enterprise scenarios, introducing the Completion Under Policy (CuP) metric.
Read PaperSelf-Healing Reliability with ALTK
CUGA's secret weapon: the Agent Lifecycle Toolkit (ALTK) provides the "immune system" that turns fragile prototypes into resilient enterprise systems. Reduces parsing-related failures by 33%+ in production.
The ALTK Philosophy
Reliability cannot be "prompted" into an LLM-it must be engineered around it. ALTK intervenes at three critical stages: Pre-LLM (before reasoning), Pre-Tool (before execution), and Post-Tool (after results). This cycle of prompt → call → validation → reflection/replan reduced parsing-related failures by more than one-third in IBM pilot runs.
Spotlight
Steers the model's attention toward critical instructions. Prevents "instruction drift" in long contexts by dynamically biasing attention logits-improving constraint adherence by 26%+.
SPARC
Semantic Pre-execution Analysis for Reliable Calls. Validates generated arguments against OpenAPI specs before execution-catching hallucinated parameters before they cause failures.
Refraction
Syntax repair engine. Intercepts minor code errors (missing brackets, indentation) and repairs them deterministically-saving costly LLM inference cycles.
JSON Processor
Auto-generates extraction code for "fat" API payloads. Filters megabytes of JSON to relevant fields only-reducing token costs and improving reasoning accuracy.
RAG Repair
Self-healing infrastructure. When tools fail, RAG Repair searches documentation to find solutions-mimicking a developer "Googling the error" and generating corrected commands.
Silent Review
Semantic auditor. Detects "silent failures" where APIs return 200 OK but with empty or error content-prompting the agent to try alternative strategies.
Configurable Reasoning Modes
Not every task needs deep planning. Trade off latency, cost, and accuracy based on your requirements.
Fast Heuristics Mode
Lighter prompting with faster models (Granite, GPT-3.5). Bypasses deep planning for routine tasks.
Deep Planning Mode
"System 2" thinking. Extensive task decomposition, self-reflection, and multi-step planning.
How CUGA Works on Katonic
Deploy, run, and scale CUGA on your infrastructure using Katonic's sovereign AI platform.
Deploy Models on Ops
Deploy open source LLMs like LLaMA, Mistral, or Granite on Katonic Ops. Run on NVIDIA GPUs or Groq LPUs with enterprise-grade inference that powers CUGA.
Deploy CUGA in Studio
Launch and deploy CUGA from Katonic AI Studio. Connect to your deployed models, configure data sources, and define the tools CUGA can access.
Access via ACE Co-pilot
CUGA becomes available as an intelligent agent in ACE Co-pilot. Your teams interact with CUGA through natural language - complex tasks are automatically handled.
Configure with Langflow
Use Langflow from the Katonic App Store to visually configure and customize CUGA's workflows. Build and modify agent pipelines without code.
Key Features & Capabilities
Everything enterprises need to deploy autonomous AI agents at scale-from structured planning to policy enforcement.
High-Performing Generalist Agent
Combines best-of-breed agentic patterns (planner-executor, CodeAct) with structured planning and smart variable management to prevent hallucination and handle complexity.
Human-in-the-Loop Controls
Configure policy-aware instructions and approval gates. Business defines where autonomy is permitted and where human approval is mandatory.
API/Tool Hub Integration
Onboard new APIs in hours, not weeks. Centralized hub minimizes OpenAPI specs into LLM-friendly schemas with strict JSON validation.
Computer Use (Browser Agent)
Navigate web interfaces via DOM interaction. Combines Browser Planner, Action Agent, and QA Agent for visual parsing.
Open Source & Model Agnostic
Apache 2.0 license with no vendor lock-in. Choose your LLM-GPT-4, Granite, LLaMA, Mistral. CUGA can even be a tool for other agents.
Full Provenance & Audit Trails
Every response includes API paths, parameters, and computation logs. 95% of pilot responses had complete audit trails for compliance.
How CUGA Compares to Production AI Agents
The only open-source agent that beats the tech giants on benchmarks while giving you full control.
| Feature | IBM CUGA on Katonic | OpenAI Operator | Anthropic Computer Use | Google Mariner |
|---|---|---|---|---|
| WebArena Score | 61.7% #1 | 58.1% | — | — |
| Open Source | ✓ Apache 2.0 | ✗ Proprietary | ✗ Proprietary | ✗ Proprietary |
| Data Sovereignty | ✓ On-premise / Your cloud | ✗ OpenAI servers | ✗ Anthropic servers | ✗ Google servers |
| Auditability | ✓ Glass-Box (full logs) | ✗ Black Box | ✗ Black Box | ✗ Black Box |
| Model Choice | ✓ Any LLM (GPT, LLaMA, etc.) | ✗ GPT-4 only | ✗ Claude only | ✗ Gemini only |
| Enterprise HITL | ✓ Configurable gates | Limited | Limited | Limited |
| Self-Healing (ALTK) | ✓ Native | ✗ No | ✗ No | ✗ No |
| Best For | Regulated enterprises | Consumer tasks | Developer tools | Browser research |
Fully Open Source
CUGA is released under Apache 2.0 license. Inspect the code, contribute improvements, and deploy with confidence knowing there's no black box.
Ready to See CUGA in Action?
Get the enterprise AI agent running on your infrastructure. Full data sovereignty. No vendor lock-in. Reach out for a personalized demo.