EN29 Apr 20268 min read

Agentic orchestration: the manager pattern that's quietly eating B2B AI

One AI that does everything is dead. The companies winning ship multi-agent orchestrators that look more like a small team than a chatbot. Here's the honest version: patterns that work, costs that bite, and where LATAM operators have an advantage.

The most useful frame for thinking about AI in 2026 is not "the model is smart enough now." It is: the model is finally good enough to be a worker, but only inside a structure that decides what it works on, in what order, with what data, and under what supervision. The structure is what people call "agentic orchestration." It is the pattern that's quietly eating the B2B AI category.

This post is the honest version of where that's actually working, where it's actually breaking, and what changes for LATAM operators specifically.

Why "one big chatbot" stopped winning

For two years, the default deployment was: take a strong model, give it some tools, write a long system prompt, hope it figures the rest out. Customers saw demos that looked magical. Then they tried real workflows and watched the same single agent forget the goal at step 40, hallucinate a database row, or quietly ignore a constraint in the prompt.

Anthropic's own research team reported a 90.2% performance lift on internal research tasks when they moved from a single Opus-4 agent to a lead-Opus-4 plus Sonnet-4 sub-agents. The single best model, alone, lost to a coordinated team of cheaper models doing scoped work. The argument is no longer "use the smartest model." It is "build the right shape of work."

The shapes have names now. The 2025–2026 consensus taxonomy comes mostly from Anthropic's "Building Effective Agents" plus the OpenAI cookbook on agent handoffs.

The named patterns, in plain English

Prompt chaining. A sequence of LLM calls where each step's output feeds the next. Use when a task decomposes cleanly: extract → categorize → summarize → draft. Cheap, predictable, easy to debug.
Routing. A classifier (LLM or rule-based) sends each request to a specialized downstream prompt or agent. The "easy questions to a small model, hard questions to a frontier model" pattern is a routing instance. The single biggest cost lever for production agents.
Parallelization. The same input fans out to multiple workers. Two flavors: sectioning (split independent subtasks across agents) and voting (run the same task N times and aggregate). Voting is your reliability lever for high-stakes outputs.
Orchestrator-worker (a.k.a. supervisor / lead-subagent). A central LLM dynamically decomposes the task, spawns workers, synthesizes the results. The pattern Anthropic's research system uses. The right shape when you cannot pre-plan the work.
Evaluator-optimizer. A "doer" produces output; a "judge" scores it against a rubric; the doer revises. Closes the loop on quality at the cost of more tokens. Use it where output quality matters more than latency.
Plan-execute. A planner emits an ordered plan; a cheaper executor walks it step by step. Cheaper than ReAct for long horizons because the expensive model only plans once.
ReAct (Reason + Act). Interleaved thought / tool-call / observation in a single loop. The 2022 baseline; still the right starting point for short tasks.
Reflection / Reflexion. Agent critiques its own output and retries. A single-agent self-loop variant of evaluator-optimizer. Useful, expensive.
Swarm / handoff. Peer agents transfer control with explicit handoff functions; only one agent's instructions are "active" at a time. The OpenAI Agents SDK vocabulary. Good for "specialist desk" experiences (sales agent → support agent → billing agent).

If your team has not picked names for the patterns it ships, start there. You cannot debug what you cannot label.

What's actually shipped, with numbers

The honest case studies, all from public-record reporting:

Klarna (fintech support). Their AI assistant handled 2.3M customer conversations in its first month, the equivalent of about 700 full-time agents. CSAT up 47%. Resolution time down to two minutes. Roughly $60M saved by Q3 2025. Then in May 2025 Klarna walked it back toward a hybrid "Uber-style" human pool when complex empathy-heavy cases revealed the limits. The most cited and most honest case in the category.
Sierra (customer-experience platform, $10B valuation Sept 2025). Chime: resolution rate moved from 40% to 70%+. Hertz: deflection rate from 10% to 70%+ in six weeks.
Harvey (legal). About $100M ARR by Aug 2025; active matters jumped 36× in 18 months. In May 2025 they pivoted into a multi-model orchestrator routing across OpenAI / Google / Anthropic by query type.
BDO Colombia (finance / payroll, LATAM). Built on Microsoft Copilot Studio + Power Platform: 50% workload reduction, 99.9% accuracy on the managed request types. One of the few publicly documented LATAM agentic deployments (source).
Santander + Visa launched Latin America's first end-to-end AI-agent payment system in March 2026.

If you want the specific source for any of these, the references at the bottom of this post link to each one.

The honest critique

Now the part that does not show up in the marketing.

Cost iceberg. Agentic deployments use 20–30× more tokens than vanilla genAI workflows. Multi-turn loops grow tokens quadratically. A Reflexion-10 loop is roughly 50× a single pass. Unconstrained agents can spend $5–8 per task on frontier models.

Reliability ceiling. Anthropic's own and Galileo's evaluations both put agent success on complex real-world tasks around 50%. Gartner predicts more than 40% of agentic projects will be canceled by end of 2027. This is not because the technology is bad; it is because most teams ship in shapes that fail quietly.

Cascading failures. A single bad inference at step 3 of a 50-step plan propagates. The Replit incident in July 2025 (an agent deleted a production database despite explicit freeze instructions) is the canonical example. 88% of organizations reported at least one agent-related security incident in 2025.

Context drift. By step 40 to 50 of a long task, the agent loses grip on the goal. Long-running agents need explicit "remind me what I am doing" checkpoints, or they wander.

Debugging nightmare. Emergent multi-agent behaviors require new observability tooling: decision paths, agent-to-agent message logs, tool-call traces. Without it, post-mortems take days.

Coordination overhead. Multi-agent systems often spend more compute on agents waiting on each other than on the actual work. The pattern most people copy from Twitter — five agents in a swarm — is usually slower and more expensive than the orchestrator-worker pattern that gets recommended in the literature.

The honest read: agentic systems work, and they fail in ways operators are not used to. If your team treats the agent like a deterministic API, your post-mortems will be confusing.

Where the LATAM operator has an unfair advantage

This is the part Silicon Valley posts will not write.

The adoption gap is wide and well-defined. Roughly 95% of South American firms touch generative AI in some form (Bain, May 2025). But only 14% have an agentic project in production according to regional industry data. That 81-point gap is the entire opportunity.

Cost-aware patterns are not optional. B2B contracts in LATAM are smaller than in NA / EU. The "agent costs $5 per task" headline lands harder here. That makes routing, plan-execute, and evaluator-optimizer with cheap-tier executors the default architectures, not nice-to-haves. LATAM operators forced into cost discipline build leaner agentic systems on average.

Spanish-language coverage is genuinely good. Frontier models perform strongly in Spanish in 2026. The remaining engineering work is regional vocabulary, Portuguese for Brazil, ES↔EN handoffs in operations workflows, and a few indigenous-language edge cases. Solvable. Worth doing.

Less compliance drag, for now. No LATAM equivalent of the EU AI Act has shipped yet. National AI strategies are emerging but the regulatory pace is still slower than Brussels. There is a 12–18 month window where shipping production agents is structurally easier here than in Europe.

Banks and consultancies are the channel. Santander+Visa, NTT-Data+AWS, BDO. The buyers are partnership-led, not VC-led. Pitch agentic systems as plumbing for an existing channel partner, not as a SaaS app on a card.

How to choose the right pattern

The boring practical guidance, in order:

Start with prompt chaining and routing. They cover 70% of real B2B use cases. They are cheap. They are debuggable.
Add evaluator-optimizer where output quality is non-negotiable. Legal, medical, financial. Pay the token cost.
Reach for orchestrator-worker only when the task structure is genuinely unknowable up front. Research, complex sales-cycle workflows, multi-document contract negotiations.
Avoid swarms unless you specifically need a "specialist desk" UX. They are a beautiful demo and a brutal post-mortem.
Instrument everything. If you cannot replay a failed run end-to-end, you do not have an agentic system. You have a black box that occasionally embarrasses you.

Closing

The companies winning the next two years of B2B AI will not be the ones with the smartest models. They will be the ones with the best-shaped work.

That is good news for LATAM operators, because the shape of the work is bottlenecked on judgment, not on engineering. Industrial-engineering thinking — process flows, throughput, bottlenecks, quality gates — is exactly the muscle this category rewards. The models are global. The operations are local. The orchestration is where the two meet.