BUILD · Jun 14, 2026

How to build model-agnostic agents that survive a provider shutdown

When the US government forced Anthropic to pull Fable 5 globally, every developer building on Claude learned the hard way. How to build agents that switch models in hours, not weeks.

Agent-ready: drop this post into Claude Code or Codex

TL;DR: The US government forced Anthropic to pull Fable 5 for all customers. If your agent is coupled to one model, you’re one government order, a pricing change, or an API deprecation away from a full rewrite. Here’s the architecture that prevents that.

Key takeaways:

  • The Fable 5 ban proved that model access can disappear overnight. Not just for foreign nationals. For everyone.
  • A model-agnostic architecture separates orchestration, context, and governance from the model layer.
  • The gateway pattern (LiteLLM, Portkey) normalizes API differences so routing becomes a config flag.
  • Classify tasks into tiers and route each to the appropriate model. Fall back on failure.
  • Being model-agnostic isn’t about abstracting away differences. It’s about making switching a config change.

On June 12, 2026, the US government ordered Anthropic to suspend access to Fable 5 and Mythos 5 by any foreign national. Anthropic’s response: disable both models for all customers globally. Twenty-seven million views on X in hours. Every developer building on Claude went into emergency triage.

I spent that afternoon migrating a production agent from Fable 5 to DeepSeek V4-Pro. The parts that took hours were the parts I’d abstracted behind a gateway. The parts that took days were the parts I’d coupled directly to Claude’s prompt format and tool schema.

This post is what I learned. Build this architecture now, before the next shutdown hits a model you depend on.

Why model lock-in is an architectural problem

Most teams treat model providers as interchangeable. They’re not. Each one has a different API format, different prompt sensitivity, different tool-calling conventions, and different pricing. The Anthropic Fable 5 ban exposed what happens when you fuse your agent logic to one provider’s control plane.

The cost of provider lock-in shows up in four ways.

Migration tax. Re-platforming prompts, APIs, and validation across codebases. Enterprise AI spend reached $37 billion in 2025 per Menlo Ventures. Every dollar spent on one provider is a dollar that has to be re-spent to switch.

Orchestration rigidity. If your routing logic, approval workflows, and governance are embedded in one provider’s SDK, switching means rebuilding the entire control plane.

Pricing exposure. When a provider raises prices or changes their billing model, you have no negotiating position if you can’t walk. Portability is a hedge.

Governance constraints. Data residency, compliance, and jurisdictional requirements change. One provider can’t serve every market.

When single-provider is acceptable

There are three cases where provider lock-in is fine. Early-stage experimentation before product-market fit. Single-team production with low volume. Strict regulatory binding that limits your options anyway.

The threshold for multi-provider architecture is crossed when your spend is high enough that a price change hurts, your workflow depth makes re-platforming a multi-quarter project, or your compliance scope spans jurisdictions.

What are the three layers of a model-agnostic agent?

The cleanest model-agnostic stack has three separate layers. Fuse any two of them and you pay migration costs at both layers when you need to switch.

LayerWhat it doesProvider-specific?
Model layerRuns inference, returns completionsYes, swappable
Context and abstraction layerNormalizes API format, prompt templates, tool schemasNo
Orchestration layerWorkflow sequencing, routing decisions, governanceNo

The gateway pattern

The abstraction layer is a gateway that translates between your orchestration code and any provider’s API. The OpenAI API format has become the de facto contract here. Tools like LiteLLM and Portkey expose this abstraction as a drop-in proxy.

The gateway handles three things:

Format normalization. Your orchestration code sends one format. The gateway translates to each provider’s native format and back. Tool definitions, system prompts, and response parsing all go through the same contract.

Credential management. API keys and endpoints are configured per provider in the gateway. Your agent code never touches them.

Fallback routing. If provider A returns a 503 or times out, the gateway retries on provider B before your agent even knows there was a failure.

Here’s what this looks like in practice:

# Instead of this (coupled to one provider):
from anthropic import Anthropic
client = Anthropic()
response = client.messages.create(
    model="claude-fable-5",
    messages=[{"role": "user", "content": prompt}]
)

# Do this (abstracted behind a gateway):
from your_gateway import ModelGateway
gateway = ModelGateway(fallback_chain=["fable-5", "deepseek-v4-pro", "gpt-5.5"])
response = gateway.chat(prompt)

The gateway code is about 50 lines. The fallback chain is a config file. That’s the difference between an afternoon migration and a six-week rewrite.

Task classification for routing

Not every task needs a frontier model. The mistake most teams make is routing everything to their best model and calling it a day. That works until the pricing change hits.

Build a task classifier that categorizes every incoming request:

ClassificationTask examplesModel tier
EasyBoilerplate code, simple edits, documentationBudget models ($0.09-0.30/M tokens)
MediumMulti-file changes, moderate logicMid-tier models ($0.30-1.50/M)
HardArchitecture decisions, complex debugging, large refactorsFrontier models ($1.50-10/M)
Needs infoAmbiguous or underspecified promptsRoute to clarification

Router decisions should be sticky across tool-call follow-ups within a single turn. Switching models mid-turn invalidates caches and confuses the agent. Treat cache eviction as a real cost.

How do I build a model fallback chain?

A fallback chain is an ordered list of models. When the primary model fails, the gateway tries the next one. When it succeeds, the agent never sees the failure.

The chain should reflect your priorities:

fallback_chain = [
    {"provider": "deepseek", "model": "deepseek-v4-pro", "tier": "frontier"},
    {"provider": "openai", "model": "gpt-5.5", "tier": "frontier"},
    {"provider": "anthropic", "model": "claude-sonnet-4", "tier": "mid"},
    {"provider": "google", "model": "gemini-2.5-pro", "tier": "mid"},
]

Each entry specifies the provider, model, and cost tier. The gateway routes based on the task classification and falls back along the chain when a provider is unreachable or returns errors.

What to do about prompt differences

This is the hard part. Prompts tuned for one model degrade on another. Claude prefers structured XML-style prompts. GPT works better with markdown. DeepSeek sits somewhere in between.

The pragmatic approach is tiered prompt templates. Each model family gets its own prompt template, versioned and tested alongside the gateway config. When you switch a model, you switch the prompt template too. The gateway handles this mapping transparently.

Start with a small set of prompt templates for each model family. Test them against your core agent workflows. Version control both the templates and the model config. When a model update breaks your prompts, you can roll back the config, not the code.

How do I test a model fallback chain?

You can’t trust a fallback chain you’ve never tested. Run chaos experiments before you need them:

  1. Block each provider’s API endpoint in your gateway config and verify the agent still completes tasks.
  2. Measure latency differences between providers for the same task classification.
  3. Compare output quality across providers using your eval suite.
  4. Calculate the cost difference of running on each fallback tier.

The Pulumi team’s 2026 survey of agent infrastructure found that production AI spend reached $37 billion in 2025, and Gartner predicts organizations will use small, task-specific models three times as often as general-purpose LLMs by 2027. The economics are moving toward multi-model routing whether you plan for it or not.

FAQ

What is a model-agnostic agent architecture? A model-agnostic agent separates orchestration, context management, and governance from the model layer. Instead of hardcoding one provider’s API and prompt format, you abstract model interactions behind a gateway that lets you swap providers by changing config, not code.

Why did the US government ban Fable 5? The US government issued an export control directive under national security authorities, ordering Anthropic to suspend access to Fable 5 and Mythos 5 by any foreign national. Anthropic had to disable the models for all customers because it couldn’t distinguish users by nationality.

What are the key components of a model-agnostic stack? Three layers: a swappable model layer (provider-specific SDKs), a context and abstraction layer (normalizes API format, prompt templates, and tool schemas across providers), and an orchestration layer (workflow sequencing, routing, governance). LiteLLM and Portkey are common tools for the abstraction layer.

How do you route between multiple models? Classify tasks by complexity. Easy tasks go to budget models, medium to mid-tier, hard to frontier. Route decisions are sticky within a turn to avoid cache thrashing. Implement fallback chains so if one provider fails or degrades, the next in line takes over.


This article was published on Agentic Up (https://agenticup.dev). Practical guides for developers and founders building with AI agents. Reach me at [email protected].

Newsletter

Get the brief on AI agents

Practical posts on shipping agents, automating work, and building in public. No hype, no fluff.

Contact: [email protected]