# Agent Workflow Designer

**Tier:** POWERFUL  
**Category:** Engineering  
**Domain:** Multi-Agent Systems / AI Orchestration

---

## Overview

Design production-grade multi-agent orchestration systems. Covers five core patterns (sequential pipeline, parallel fan-out/fan-in, hierarchical delegation, event-driven, consensus), platform-specific implementations, handoff protocols, state management, error recovery, context window budgeting, and cost optimization.

---

## Core Capabilities

- Pattern selection guide for any orchestration requirement
- Handoff protocol templates (structured context passing)
- State management patterns for multi-agent workflows
- Error recovery and retry strategies
- Context window budget management
- Cost optimization strategies per platform
- Platform-specific configs: Claude Code Agent Teams, OpenClaw, CrewAI, AutoGen

---

## When to Use

- Building a multi-step AI pipeline that exceeds one agent's context capacity
- Parallelizing research, generation, or analysis tasks for speed
- Creating specialist agents with defined roles and handoff contracts
- Designing fault-tolerant AI workflows for production

---

## Pattern Selection Guide

```
Is the task sequential (each step needs previous output)?
  YES → Sequential Pipeline
  NO  → Can tasks run in parallel?
          YES → Parallel Fan-out/Fan-in
          NO  → Is there a hierarchy of decisions?
                  YES → Hierarchical Delegation
                  NO  → Is it event-triggered?
                          YES → Event-Driven
                          NO  → Need consensus/validation?
                                  YES → Consensus Pattern
```

---

## Pattern 1: Sequential Pipeline

**Use when:** Each step depends on the previous output. Research → Draft → Review → Polish.

```python
# sequential_pipeline.py
from dataclasses import dataclass
from typing import Callable, Any
import anthropic

@dataclass
class PipelineStage:
    name: str
    system_prompt: str
    input_key: str      # what to take from state
    output_key: str     # what to write to state
    model: str = "claude-3-5-sonnet-20241022"
    max_tokens: int = 2048

class SequentialPipeline:
    def __init__(self, stages: list[PipelineStage]):
        self.stages = stages
        self.client = anthropic.Anthropic()
    
    def run(self, initial_input: str) -> dict:
        state = {"input": initial_input}
        
        for stage in self.stages:
            print(f"[{stage.name}] Processing...")
            
            stage_input = state.get(stage.input_key, "")
            
            response = self.client.messages.create(
                model=stage.model,
                max_tokens=stage.max_tokens,
                system=stage.system_prompt,
                messages=[{"role": "user", "content": stage_input}],
            )
            
            state[stage.output_key] = response.content[0].text
            state[f"{stage.name}_tokens"] = response.usage.input_tokens + response.usage.output_tokens
            
            print(f"[{stage.name}] Done. Tokens: {state[f'{stage.name}_tokens']}")
        
        return state

# Example: Blog post pipeline
pipeline = SequentialPipeline([
    PipelineStage(
        name="researcher",
        system_prompt="You are a research specialist. Given a topic, produce a structured research brief with: key facts, statistics, expert perspectives, and controversy points.",
        input_key="input",
        output_key="research",
    ),
    PipelineStage(
        name="writer",
        system_prompt="You are a senior content writer. Using the research provided, write a compelling 800-word blog post with a clear hook, 3 main sections, and a strong CTA.",
        input_key="research",
        output_key="draft",
    ),
    PipelineStage(
        name="editor",
        system_prompt="You are a copy editor. Review the draft for: clarity, flow, grammar, and SEO. Return the improved version only, no commentary.",
        input_key="draft",
        output_key="final",
    ),
])
```

---

## Pattern 2: Parallel Fan-out / Fan-in

**Use when:** Independent tasks that can run concurrently. Research 5 competitors simultaneously.

```python
# parallel_fanout.py
import asyncio
import anthropic
from typing import Any

async def run_agent(client, task_name: str, system: str, user: str, model: str = "claude-3-5-sonnet-20241022") -> dict:
    """Single async agent call"""
    loop = asyncio.get_event_loop()
    
    def _call():
        return client.messages.create(
            model=model,
            max_tokens=2048,
            system=system,
            messages=[{"role": "user", "content": user}],
        )
    
    response = await loop.run_in_executor(None, _call)
    return {
        "task": task_name,
        "output": response.content[0].text,
        "tokens": response.usage.input_tokens + response.usage.output_tokens,
    }

async def parallel_research(competitors: list[str], research_type: str) -> dict:
    """Fan-out: research all competitors in parallel. Fan-in: synthesize results."""
    client = anthropic.Anthropic()
    
    # FAN-OUT: spawn parallel agent calls
    tasks = [
        run_agent(
            client,
            task_name=competitor,
            system=f"You are a competitive intelligence analyst. Research {competitor} and provide: pricing, key features, target market, and known weaknesses.",
            user=f"Analyze {competitor} for comparison with our product in the {research_type} market.",
        )
        for competitor in competitors
    ]
    
    results = await asyncio.gather(*tasks, return_exceptions=True)
    
    # Handle failures gracefully
    successful = [r for r in results if not isinstance(r, Exception)]
    failed = [r for r in results if isinstance(r, Exception)]
    
    if failed:
        print(f"Warning: {len(failed)} research tasks failed: {failed}")
    
    # FAN-IN: synthesize
    combined_research = "\n\n".join([
        f"## {r['task']}\n{r['output']}" for r in successful
    ])
    
    synthesis = await run_agent(
        client,
        task_name="synthesizer",
        system="You are a strategic analyst. Synthesize competitor research into a concise comparison matrix and strategic recommendations.",
        user=f"Synthesize these competitor analyses:\n\n{combined_research}",
        model="claude-3-5-sonnet-20241022",
    )
    
    return {
        "individual_analyses": successful,
        "synthesis": synthesis["output"],
        "total_tokens": sum(r["tokens"] for r in successful) + synthesis["tokens"],
    }
```

---

## Pattern 3: Hierarchical Delegation

**Use when:** Complex tasks with subtask discovery. Orchestrator breaks down work, delegates to specialists.

```python
# hierarchical_delegation.py
import json
import anthropic

ORCHESTRATOR_SYSTEM = """You are an orchestration agent. Your job is to:
1. Analyze the user's request
2. Break it into subtasks
3. Assign each to the appropriate specialist agent
4. Collect results and synthesize

Available specialists:
- researcher: finds facts, data, and information
- writer: creates content and documents  
- coder: writes and reviews code
- analyst: analyzes data and produces insights

Respond with a JSON plan:
{
  "subtasks": [
    {"id": "1", "agent": "researcher", "task": "...", "depends_on": []},
    {"id": "2", "agent": "writer", "task": "...", "depends_on": ["1"]}
  ]
}"""

SPECIALIST_SYSTEMS = {
    "researcher": "You are a research specialist. Find accurate, relevant information and cite sources when possible.",
    "writer": "You are a professional writer. Create clear, engaging content in the requested format.",
    "coder": "You are a senior software engineer. Write clean, well-commented code with error handling.",
    "analyst": "You are a data analyst. Provide structured analysis with evidence-backed conclusions.",
}

class HierarchicalOrchestrator:
    def __init__(self):
        self.client = anthropic.Anthropic()
    
    def run(self, user_request: str) -> str:
        # 1. Orchestrator creates plan
        plan_response = self.client.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=1024,
            system=ORCHESTRATOR_SYSTEM,
            messages=[{"role": "user", "content": user_request}],
        )
        
        plan = json.loads(plan_response.content[0].text)
        results = {}
        
        # 2. Execute subtasks respecting dependencies
        for subtask in self._topological_sort(plan["subtasks"]):
            context = self._build_context(subtask, results)
            specialist = SPECIALIST_SYSTEMS[subtask["agent"]]
            
            result = self.client.messages.create(
                model="claude-3-5-sonnet-20241022",
                max_tokens=2048,
                system=specialist,
                messages=[{"role": "user", "content": f"{context}\n\nTask: {subtask['task']}"}],
            )
            results[subtask["id"]] = result.content[0].text
        
        # 3. Final synthesis
        all_results = "\n\n".join([f"### {k}\n{v}" for k, v in results.items()])
        synthesis = self.client.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=2048,
            system="Synthesize the specialist outputs into a coherent final response.",
            messages=[{"role": "user", "content": f"Original request: {user_request}\n\nSpecialist outputs:\n{all_results}"}],
        )
        return synthesis.content[0].text
    
    def _build_context(self, subtask: dict, results: dict) -> str:
        if not subtask.get("depends_on"):
            return ""
        deps = [f"Output from task {dep}:\n{results[dep]}" for dep in subtask["depends_on"] if dep in results]
        return "Previous results:\n" + "\n\n".join(deps) if deps else ""
    
    def _topological_sort(self, subtasks: list) -> list:
        # Simple ordered execution respecting depends_on
        ordered, remaining = [], list(subtasks)
        completed = set()
        while remaining:
            for task in remaining:
                if all(dep in completed for dep in task.get("depends_on", [])):
                    ordered.append(task)
                    completed.add(task["id"])
                    remaining.remove(task)
                    break
        return ordered
```

---

## Handoff Protocol Template

```python
# Standard handoff context format — use between all agents
@dataclass
class AgentHandoff:
    """Structured context passed between agents in a workflow."""
    task_id: str
    workflow_id: str
    step_number: int
    total_steps: int
    
    # What was done
    previous_agent: str
    previous_output: str
    artifacts: dict  # {"filename": "content"} for any files produced
    
    # What to do next
    current_agent: str
    current_task: str
    constraints: list[str]  # hard rules for this step
    
    # Metadata
    context_budget_remaining: int  # tokens left for this agent
    cost_so_far_usd: float
    
    def to_prompt(self) -> str:
        return f"""
# Agent Handoff — Step {self.step_number}/{self.total_steps}

## Your Task
{self.current_task}

## Constraints
{chr(10).join(f'- {c}' for c in self.constraints)}

## Context from Previous Step ({self.previous_agent})
{self.previous_output[:2000]}{"... [truncated]" if len(self.previous_output) > 2000 else ""}

## Context Budget
You have approximately {self.context_budget_remaining} tokens remaining. Be concise.
"""
```

---

## Error Recovery Patterns

```python
import time
from functools import wraps

def with_retry(max_attempts=3, backoff_seconds=2, fallback_model=None):
    """Decorator for agent calls with exponential backoff and model fallback."""
    def decorator(fn):
        @wraps(fn)
        def wrapper(*args, **kwargs):
            last_error = None
            for attempt in range(max_attempts):
                try:
                    return fn(*args, **kwargs)
                except Exception as e:
                    last_error = e
                    if attempt < max_attempts - 1:
                        wait = backoff_seconds * (2 ** attempt)
                        print(f"Attempt {attempt+1} failed: {e}. Retrying in {wait}s...")
                        time.sleep(wait)
                        
                        # Fall back to cheaper/faster model on rate limit
                        if fallback_model and "rate_limit" in str(e).lower():
                            kwargs["model"] = fallback_model
            raise last_error
        return wrapper
    return decorator

@with_retry(max_attempts=3, fallback_model="claude-3-haiku-20240307")
def call_agent(model, system, user):
    ...
```

---

## Context Window Budgeting

```python
# Budget context across a multi-step pipeline
# Rule: never let any step consume more than 60% of remaining budget

CONTEXT_LIMITS = {
    "claude-3-5-sonnet-20241022": 200_000,
    "gpt-4o": 128_000,
}

class ContextBudget:
    def __init__(self, model: str, reserve_pct: float = 0.2):
        total = CONTEXT_LIMITS.get(model, 128_000)
        self.total = total
        self.reserve = int(total * reserve_pct)  # keep 20% as buffer
        self.used = 0
    
    @property
    def remaining(self):
        return self.total - self.reserve - self.used
    
    def allocate(self, step_name: str, requested: int) -> int:
        allocated = min(requested, int(self.remaining * 0.6))  # max 60% of remaining
        print(f"[Budget] {step_name}: allocated {allocated:,} tokens (remaining: {self.remaining:,})")
        return allocated
    
    def consume(self, tokens_used: int):
        self.used += tokens_used

def truncate_to_budget(text: str, token_budget: int, chars_per_token: float = 4.0) -> str:
    """Rough truncation — use tiktoken for precision."""
    char_budget = int(token_budget * chars_per_token)
    if len(text) <= char_budget:
        return text
    return text[:char_budget] + "\n\n[... truncated to fit context budget ...]"
```

---

## Cost Optimization Strategies

| Strategy | Savings | Tradeoff |
|---|---|---|
| Use Haiku for routing/classification | 85-90% | Slightly less nuanced judgment |
| Cache repeated system prompts | 50-90% | Requires prompt caching setup |
| Truncate intermediate outputs | 20-40% | May lose detail in handoffs |
| Batch similar tasks | 50% | Latency increases |
| Use Sonnet for most, Opus for final step only | 60-70% | Final quality may improve |
| Short-circuit on confidence threshold | 30-50% | Need confidence scoring |

---

## Common Pitfalls

- **Circular dependencies** — agents calling each other in loops; enforce DAG structure at design time
- **Context bleed** — passing entire previous output to every step; summarize or extract only what's needed
- **No timeout** — a stuck agent blocks the whole pipeline; always set max_tokens and wall-clock timeouts
- **Silent failures** — agent returns plausible but wrong output; add validation steps for critical paths
- **Ignoring cost** — 10 parallel Opus calls is $0.50 per workflow; model selection is a cost decision
- **Over-orchestration** — if a single prompt can do it, it should; only add agents when genuinely needed
