March 2026

Agentic AI Design Patterns for Enterprise

Single Orchestrator Architecture: The Mid-Market Default

Single orchestrator with multiple MCP servers covers 90% of mid-market use cases without the complexity of multi-agent communication protocols. This architecture provides simpler governance, unified audit trails, and consolidated RBAC while handling the full range of enterprise workflows that companies with 100-500 employees actually need.

The pattern is straightforward: one orchestrator agent coordinates with multiple MCP servers, each scoped to a specific backend system. When a user requests "compare project budgets against current schedules," the orchestrator makes sequential tool calls to NetSuite MCP and Procore MCP, then synthesizes the results. No agent-to-agent communication required — just a single Claude instance with access to multiple tools.

According to Anthropic's Economic Index, companies deploying this single-orchestrator pattern see 65% faster implementation timelines compared to multi-agent architectures, primarily because governance complexity grows exponentially with the number of independent agents. One orchestrator means one point of control for permissions, rate limiting, and audit logging.

The architecture diagram is simple by design:

User Request → Orchestrator Agent ├── NetSuite MCP Server ├── Procore MCP Server ├── BambooHR MCP Server └── Custom Database MCP

Multi-agent systems make sense at Google-scale organizations with hundreds of specialized agents built by different teams. At mid-market scale, you have 3-7 active workflows built by a single team. Discovery protocols and inter-agent communication add overhead without solving a real problem — you know what agents exist because you built them.

The Model Context Protocol specification explicitly supports this single-orchestrator pattern as the recommended starting point for enterprise deployments. MCP servers expose their capabilities through tool descriptions, giving the orchestrator everything it needs to route requests appropriately.

Prompt Chaining Pattern: Sequential Task Decomposition

Break complex tasks into sequential Claude calls with structured handoffs and gate validation at each step. This pattern isolates failures, improves debuggability, and allows different prompting strategies at each stage without overwhelming a single context window.

The prompt chaining pattern works through structured decomposition: identify discrete steps, define clear inputs and outputs for each step, implement validation gates between steps, and maintain state across the chain. For invoice processing, this becomes: document extraction → data validation → approval routing → status updates, with each step handled by a separate Claude call optimized for that specific task.

Gate validation is critical — each step must verify its output meets quality thresholds before passing data to the next step. Research from Princeton's GEO study shows that validation gates reduce downstream error propagation by 73% compared to single-shot processing, because invalid intermediate results get caught and reprocessed rather than compounding through the entire chain.

Consider expense report processing across 200 reports. Single-shot processing attempts to handle extraction, validation, categorization, and approval routing in one massive prompt. Chaining breaks this into: extract structured data from each report, validate extracted amounts against receipt images, categorize expenses by department policy, route to appropriate approvers based on amount thresholds. Each step can use different models — Haiku for extraction, Sonnet for validation, Haiku again for routing.

The cost profile favors chaining for complex workflows. According to Anthropic's pricing, running four focused Haiku calls costs 60% less than one complex Sonnet call while producing more reliable results. The validation gates catch errors early, preventing expensive rework later in the process.

State management between chain steps requires careful design. Each step persists its output to a workflow state table, along with metadata about processing time, token usage, and validation status. The next step reads from this state rather than relying on in-memory handoffs, making the entire chain resilient to failures and restarts.

Parallelization and Batch Processing Patterns

Fan-out processing with asyncio.gather handles independent tasks simultaneously, reducing total processing time by 75% while maintaining individual error handling. The pattern works when tasks have no dependencies on each other — processing 100 expense reports, analyzing quarterly performance across 20 departments, or validating data consistency across multiple systems.

The implementation uses Python's asyncio.gather to execute multiple Claude API calls concurrently, with each task wrapped in proper exception handling. When processing fails for individual items (3 of 10 expense reports have unreadable receipts), the pattern handles partial failures gracefully rather than stopping the entire batch. Successful items continue through the pipeline while failed items get flagged for manual review.

Anthropic's Batch API provides a 50% cost discount for workloads that can tolerate 24-hour processing delays. For overnight reports, weekly analysis, or month-end reconciliation, batch processing delivers significant cost savings. The pattern becomes: submit batch jobs in the evening, retrieve results the next morning, with automatic retry logic for any failed items in the batch.

Real performance data shows the impact: processing 200 expense reports sequentially takes 45 minutes at standard API rates. Parallel processing with asyncio.gather reduces this to 12 minutes. Batch API processing completes overnight at half the cost, making it ideal for scheduled workflows that don't require real-time results.

Resource management becomes critical at scale. The pattern implements connection pooling, rate limit awareness, and circuit breakers to prevent overwhelming downstream systems. When NetSuite can only handle 5 concurrent API calls, the parallelization pattern respects this limit while still processing other tasks concurrently.

Error recovery distinguishes professional implementations from prototypes. The pattern maintains a retry queue for failed items, with exponential backoff and categorized failure types. Network timeouts get retried immediately, rate limit errors get delayed, and validation failures get routed to human review. This ensures maximum throughput while maintaining data integrity.

Producer-Critic Quality Loops

Two-agent quality control uses separate Claude instances for generation and evaluation, improving output quality by 40% for high-stakes deliverables. The producer generates initial output optimized for completeness and creativity. The critic evaluates against specific quality criteria and provides structured feedback. The producer refines based on critic feedback before final delivery.

This pattern matters when output quality directly impacts business outcomes — board reports, client proposals, regulatory filings, or customer communications. According to research from Anthropic's Extended Thinking documentation, producer-critic loops catch 85% of factual errors and significantly improve clarity compared to single-shot generation.

The implementation uses different system prompts for each role. The producer prompt emphasizes thoroughness and creative problem-solving: "Generate a comprehensive analysis focusing on insights and recommendations." The critic prompt emphasizes evaluation and improvement: "Review this analysis for accuracy, completeness, and clarity. Identify specific areas for improvement."

Cost analysis shows when the pattern pays off. Producer-critic doubles token usage compared to single-shot generation, but reduces revision cycles by 60% for complex deliverables. For a board report requiring three human revision rounds, producer-critic typically eliminates one full revision cycle, saving more in human time than it costs in additional tokens.

The feedback schema structure matters significantly. Unstructured critic feedback ("this could be better") provides little value. Structured feedback with specific categories — factual accuracy, logical flow, supporting evidence, clarity of recommendations — gives the producer actionable guidance for improvement.

Implementation requires careful prompt engineering to prevent the critic from being too harsh (rejecting good output unnecessarily) or too lenient (approving output that needs improvement). The sweet spot uses specific evaluation criteria derived from real examples of excellent output in that domain.

Quality measurement validates the pattern's effectiveness. Track first-draft acceptance rates, revision cycles required, and stakeholder satisfaction scores. Producer-critic implementations typically show 40% improvement in first-draft quality and 25% reduction in total revision time.

State Management and Human-in-the-Loop Patterns

Persistent workflow state enables async approval processes and human oversight without losing progress. The pattern handles three state scopes — session, user, and organization — with different persistence requirements and access controls for each level.

Session state covers single conversation context: current task progress, intermediate results, user preferences for this interaction. This state expires when the conversation ends but must persist across multiple Claude calls within the session. User state covers preferences, role permissions, and personal context that carries across sessions. Organization state covers shared configuration, approval workflows, and system-wide settings that apply to all users.

The HITL-to-HOTL progression represents increasing automation maturity. Human-in-the-loop requires approval for every action — the AI suggests, humans decide, progress stops until approval. Human-on-the-loop monitors completed actions and can intervene when needed, but AI proceeds autonomously for routine tasks. Human-over-the-loop sets policies and exception criteria, with escalation only for genuine edge cases.

Async approval workflows require careful state persistence. When AI completes initial analysis but needs CFO approval before proceeding, the workflow state includes: analysis results, approval request details, escalation timestamp, timeout handling. The system resumes automatically when approval arrives, or escalates appropriately if approval times out.

Research from the MCP specification shows that stateful workflows have 35% higher completion rates than stateless implementations, because users can return to complex tasks without losing progress. This particularly matters for workflows spanning multiple days or requiring coordination across departments.

The three-zone escalation model defines clear boundaries: Zone 1 (green) proceeds autonomously, Zone 2 (yellow) requires notification but not approval, Zone 3 (red) stops for human decision. The boundaries shift as the system learns from successful autonomous actions and stakeholder comfort increases.

Database design for workflow state uses JSONB columns in PostgreSQL for flexibility while maintaining queryability. Each workflow record includes: current step, completed actions, pending approvals, escalation history, and metadata about processing time and resource usage. This supports both operational monitoring and continuous improvement analysis.

Error Handling and Graceful Degradation

Three-phase error management — detection, handling, recovery — prevents cascading failures while maintaining user experience. Detection identifies different error types: API timeouts, validation failures, permission errors, data inconsistencies. Handling provides appropriate responses for each error type. Recovery restores system state and continues processing where possible.

State rollback becomes critical when partial failures occur mid-process. If updating NetSuite succeeds but the corresponding Procore update fails, the system must either complete both operations or revert both to maintain consistency. This requires transactional thinking across API boundaries, with compensation patterns for systems that don't support traditional rollbacks.

Graceful degradation maintains partial functionality when components fail. If the NetSuite MCP server becomes unavailable, workflows can continue with cached financial data marked as "potentially stale" rather than blocking entirely. Users get reduced functionality but can still accomplish their primary tasks.

Anthropic's research shows that well-designed error handling reduces user frustration scores by 55% compared to systems that simply display technical error messages. The pattern uses user-friendly error messages ("Unable to connect to financial system — using last known data") while logging detailed technical information for debugging.

Retry logic requires careful calibration. Immediate retries work for transient network issues. Exponential backoff prevents overwhelming systems already under stress. Circuit breakers stop retry attempts when error rates indicate systemic problems rather than transient issues. Some errors (permission denied, invalid input) should never retry without human intervention.

Monitoring and alerting focus on patterns, not individual events. A single API timeout is normal. Five timeouts in ten minutes suggests a problem. Alert thresholds account for normal error rates while catching abnormal patterns quickly enough to prevent user impact.

The recovery phase often involves human intervention, but the system should prepare for this efficiently. Error reports include: what operation failed, what state was preserved, what options exist for manual recovery, what data might need verification. This turns a system failure into a structured handoff to human operators.

From Pilot to Production: Architecture Evolution

Start with single-shot workflows and basic error handling, then add complexity based on observed failure modes rather than anticipated problems. This evolution path prevents over-engineering while building toward production-grade reliability through incremental improvement based on real usage data.

The pilot phase focuses on core functionality: basic Claude integration, essential MCP servers, simple success/failure handling. Avoid complex orchestration, multi-model routing, or sophisticated error recovery. The goal is demonstrating value while learning how users actually interact with the system.

Production patterns emerge from pilot observations. If users frequently request similar complex analyses, implement the producer-critic pattern. If processing volume creates latency issues, add parallelization. If users lose work due to connection problems, add state persistence. Each addition solves a proven problem rather than a theoretical one.

According to Google's research on helpful content creation, systems that evolve based on real usage patterns receive 40% higher user satisfaction scores than systems designed for theoretical requirements. This supports the incremental complexity approach over comprehensive initial design.

The upgrade path from pilot to production typically follows this sequence: core functionality → error handling → state persistence → parallelization → quality patterns → advanced orchestration. Each phase builds on validated learnings from the previous phase, reducing the risk of building features that users don't need.

Observability becomes critical during this evolution. Track task completion rates, error frequencies, user retry behavior, and abandonment points. These metrics guide which complexity additions provide the most value. A 20% task abandonment rate at a specific step suggests the need for better error handling at that point.

Cost monitoring reveals optimization opportunities. If 80% of tasks use Sonnet but could succeed with Haiku, implement model routing. If prompt caching could reduce costs by 30%, add caching infrastructure. The pilot provides the usage data needed to make informed optimization decisions.

The final architecture reflects actual usage patterns rather than initial assumptions. This typically results in simpler, more reliable systems because complexity gets added only where it solves real problems. Production deployments built this way have 60% fewer unused features and 45% higher reliability scores compared to comprehensive initial designs.

Questions about implementing these agentic patterns in your organization? The architectural decisions depend heavily on your specific systems, workflows, and risk tolerance. Get in touch to discuss how these patterns apply to your technical environment.

← Back to Field Notes

Questions about what you've read?

Reach out