THINK · Jun 19, 2026

AWS just turned the agent harness into a managed service

Amazon Bedrock AgentCore harness went GA on June 18. Two API calls to a production-grade agent. Multi-model switching. Managed memory. Auto-traced. And it validates the 15-jobs framework.

Agent-ready: drop this post into Claude Code or Codex

TL;DR: AWS made the agent harness a managed service. Amazon Bedrock AgentCore harness went GA on June 18 with two API calls, multi-model switching mid-session, auto-provisioned memory, and CloudWatch tracing built in. It validates the 15-jobs framework by implementing it as infrastructure: each primitive (Runtime, Memory, Gateway, Browser, Identity, Observability) is a separate service, and the harness is the wiring. The tradeoff: you trade framework lock-in for cloud-vendor lock-in, but you get versioning, rollback, A/B testing, and evaluations out of the box.

Key takeaways:

  • AWS AgentCore harness went GA on June 18. Two API calls to a production-grade agent with sandboxed compute, memory, skills, tools, and observability
  • Multi-model switching mid-session: plan with Claude Opus, code with GPT-5.5, summarize with Gemini, all in one conversation with context preserved
  • Managed memory auto-provisions with SEMANTIC + SUMMARIZATION strategies and 30-day event expiry. Or bring your own AgentCore Memory
  • Skills catalog ships curated AWS expertise (SDK, IAM, CloudWatch, EC2, analytics). Or attach git repos, S3 bundles, or local paths The 15-jobs framework predicted this decomposition. The question is whether managed composition creates a different kind of lock-in than framework bundling

The first agent I made it to production took three weeks. The agent loop, model call, tool response, repeat: that took an afternoon. The deployment plumbing took the other nineteen days.

I needed a container. I needed to figure out where secrets lived. I needed a database for session state, a queue for tool calls, a bucket for file outputs, a log aggregator. I needed IAM roles that were neither too wide nor too narrow. Every piece was a solved problem in isolation. Together, they were a wall.

That wall is what AWS turned into two API calls.

On June 18 at the AWS New York Summit, Amazon Bedrock AgentCore harness went generally available. The official announcement calls it two API calls to a production-grade agent: CreateHarness to define an agent, InvokeHarness to run it. The harness handles the sandboxed environment, memory, identity, networking, observability, and versioning.

This isn’t a framework. It’s infrastructure.

Why is the harness the hard part, not the model?

The agent loop is the part everyone talks about. The model picks a tool, executes it, gets a result, picks the next tool. That’s the visible work. The invisible work is everything that has to be true for that loop to run safely in production.

The agent needs somewhere to run compute that’s isolated, ephemeral, and reproducible. It needs to remember who the user is and what happened in the last session. It needs to call tools without leaking credentials. It needs to switch models when a particular task demands a different capability. It needs to log every step so you can debug what went wrong. It needs to version its setup so you can roll back a bad prompt update without redeploying.

These are the 15 jobs that every agent harness must do, as defined in my earlier post. None of them are about model intelligence. All of them are about operational rigor.

Before AgentCore harness, you had two choices. Pick a framework (LangChain, LangGraph, CrewAI) that bundles all 15 jobs into one install. You accept that you can’t swap one without touching all 15. Or build your own harness by composing individual services. Then accept the months of plumbing to wire them together.

AWS built a third option. It decomposed the 15 jobs into independently usable primitives (Runtime, Memory, Gateway, Browser, Identity, Observability), then added a managed harness that wires them together as a config layer. The harness is the composition you would have built yourself: you declare it, not build it.

How does multi-model switching work mid-session?

Different tasks need different models. Planning needs a model that reasons deliberately. Coding needs a model that generates syntactically precise output. Summarization needs a model that compresses without losing signal. No single model excels at all three.

The harness lets you switch providers mid-session without losing context.

On CreateHarness, you set a default model. On any single InvokeHarness call, you override it by setting the model field to one of four provider types: bedrock (any model served on Amazon Bedrock: Anthropic Claude, Amazon Nova, Meta Llama, DeepSeek, Qwen, Kimi, MiniMax, Cohere, Mistral, OpenAI GPT-5.5 and GPT-5.4), openAi (direct OpenAI API), gemini (Google Gemini), or liteLlm (any provider LiteLLM supports: Anthropic direct, Cohere, Mistral, Vertex, Azure OpenAI).

The model you override applies to that single invocation. The default stays in place for every other call.

The part that matters more: the switch preserves the conversation. You can start a session with Claude Opus to design the architecture, invoke the next turn with GPT-5.5 to write the code, and invoke again with Gemini to summarize what AWS built. The harness carries the message history across all three models. The agent doesn’t lose context when the model changes.

This isn’t a gimmick. Different models have different strengths, and the strength gradient is real. A model that scores 47 on the Coding Index may score 32 on long-context reasoning. The harness lets you route each turn to the model that fits the task, not the model that fits the average.

The tradeoff: Multi-model switching works within a single harness session. If you want different agents (not different models for the same agent) to collaborate, you need multi-agent orchestration that the harness doesn’t provide natively. Step Functions integration is available for workflow chaining, but true multi-agent coordination (agents that delegate to each other, negotiate, and share state) is still a custom build.

What does managed memory look like?

The 15-jobs framework splits memory into three concerns: session state (what happened in this conversation), long-term storage (user preferences, facts learned across sessions), and the compaction process that moves data between them.

AgentCore harness auto-provisions managed memory with sensible defaults: SEMANTIC + SUMMARIZATION strategies, 30-day event expiry, AWS-owned encryption, and multi-tenant isolation through namespace templates keyed on actorId. You get it by omitting the memory field on CreateHarness.

If your agent is stateless, set memory: { disabled: {} } and the harness skips memory entirely. If you already have an AgentCore Memory resource, pass its ARN and the harness uses it. Switching from managed to your own memory is one UpdateHarness call.

The managed memory is automatic but not opaque. It’s a real, addressable AWS resource. You can query it, attach it to a different agent, audit it, or feed it into an analytics pipeline. When you delete the harness, the managed memory is cascade-deleted by default, or you can set deleteManagedMemory: false to keep it.

The tradeoff: Managed memory means you don’t configure a database, write schema migrations, or manage retention policies. It also means your agent’s memory lives inside AWS. If your architecture requires the agent to access a shared memory store that spans multiple cloud providers or on-premise systems, the managed memory won’t fit. You can bring your own Memory resource in that case, but it requires more setup than the managed default.

How does the skills catalog work?

Giving an agent expertise on a specific domain usually means writing long system prompt blocks or baking knowledge into the container. The harness replaces both with a skill catalog.

Skills are bundles of files, scripts, and instructions. The harness loads skill metadata on session start and pulls full content into context only when the task calls for it. Four source types are available: awsSkills (turn on the AWS-curated skill bundle with zero plumbing; skills bake into the harness runtime, no network fetch needed), git (clone a public or private repo over HTTPS, pinned to a commit or branch), s3 (pull from your own S3 bucket), and path (reference a path in the container you brought).

The AWS-curated skills repository covers the full AWS surface area: core skills (SDK usage, IaC, IAM, CloudWatch, Bedrock) and service-specific workflows for analytics, databases, EC2, networking, security, serverless, and storage. A paths glob scopes the skills: you turn on only the bundles your agent needs.

You also get per-call skill layering on InvokeHarness. Need a different skill set for a single invocation? Pass it in the call. The default stays for everything else.

The tradeoff: The curated AWS skills are a superpower for agents that operate inside AWS. They are useless for agents that work across cloud providers or talk to third-party APIs that have no AWS skill equivalent. You can fill the gap with git or S3 skills, but those require someone to author and maintain the skill bundles. AWS maintains the curated AWS skills. You maintain the git-hosted skills.

What about observability?

Every harness invocation crosses runtime, memory, gateway, browser, and code interpreter. Stitching observability across five services used to mean opening five tabs.

At GA, the AgentCore console shows a single observability widget per harness. It summarizes every primitive the harness touched, plus per-primitive sections that appear only for the primitives you configured.

CloudWatch GenAI Observability has a new Harnesses tab. You drill from harness into session into a single trace: every step, every model call, every tool invocation, every memory retrieval, every browser interaction. Logs from every primitive surface inline at the right span. You don’t hop between log groups to piece together what happened.

The tradeoff: You get observability without writing any instrumentation code. That’s the upside. The downside is that observability lives in CloudWatch. If your observability stack is Datadog, Grafana, or SigNoz, you either export CloudWatch logs to your tool or run a second observability layer on top. The harness doesn’t emit OpenTelemetry natively. You get CloudWatch or nothing.

Can you move from config to code?

The harness is declarative by design. You configure, you don’t code. But config has a ceiling. When a use case outgrows what a config file can express, custom orchestration, multi-agent coordination, deep instrumentation: you need code.

The harness ships an escape hatch: agentcore export harness --name myHarness --output ./my-agent. One CLI command exports the full harness setup as Strands-based code that runs on AgentCore Runtime or anywhere else. The exported project preserves your model, prompt, tools, memory wiring, skills, and container environment. Same compute path, same observability, same identity primitives.

The graduation is a config-to-code translation, not an architecture switch. You don’t rebuild from scratch when config stops being enough. You export, extend, and deploy.

Claude Agent SDK support is coming soon as a second export target.

The tradeoff: The export target is Strands, an AWS-specific framework. If your team standardizes on LangGraph or another non-AWS framework, the export doesn’t produce code you can use directly. Claude Agent SDK support will help, but at launch, graduating from config means committing to the Strands programming model.

How does this compare to the 15-jobs framework?

The 15-jobs framework argued that an agent harness is 15 separate concerns, and bundling them into a framework creates lock-in. The right architecture uses decomposed components with a composition layer on top.

AgentCore harness is exactly that architecture. AWS built Runtime (job 1-2: provision and persist), Identity (job 3: credential resolution), the per-turn state machine (job 4), skills (job 5: skill bodies), system prompt assembly (job 6), streaming (job 7), Gateway with policy (job 8-9: policy gate and approvals), budgets (job 10), sessions (job 12), events (job 14), and tracing (job 15): each as a separately usable service.

The harness is the composition layer. It wires the primitives together based on config. When you swap a model, you don’t touch memory. When you change retention policy, you don’t touch the state machine. The decomposition works exactly as the framework predicted.

The tradeoff the framework did not predict: AWS manages the harness. You don’t operate it. You also don’t control it. The 15-jobs framework assumed you would compose the components yourself, which gives you the freedom to replace any piece with an alternative. The harness gives you AWS’s composition, which means every piece is an AWS service. You can replace components only as far as the AWS ecosystem extends.

You trade framework lock-in for cloud-vendor lock-in. For teams already on AWS, that’s a net positive. The harness eliminates an entire class of operational work. For teams on multi-cloud or hybrid infrastructure, the harness represents a dependency that’s hard to unwind.

What is the catch?

No additional harness fee. You pay for underlying resource consumption only, as detailed on the AgentCore pricing page. Runtime compute at $0.0895 per vCPU-hour (₹7.48/hr) and $0.00945 per GB-hour (₹0.79/hr), billed on active consumption: you pay only when the CPU is computing, not while the agent waits for model responses or tool I/O. Browser and Code Interpreter use the same active-consumption model. Gateway costs per 1,000 invocations and per 1,000 search queries. Memory costs per 1,000 events and retrievals. Web Search costs $7 per 1,000 queries (₹585). Model inference uses standard Bedrock or third-party rates.

The pricing is reasonable for what you get. An agent that runs for 60 seconds and calls two tools costs pennies. An agent that runs for an hour with heavy compute costs accordingly. You pay proportionally to what your agent uses.

The real catch isn’t pricing. It’s portability.

The harness is deeply integrated with AWS: IAM for identity, CloudWatch for observability, ECR for containers, EFS and S3 for filesystems, Step Functions for workflows. Every integration is best-in-class for AWS and zero for everywhere else. If your infrastructure strategy changes, the harness setup doesn’t follow you.

AWS announced Claude Agent SDK as a future export target, which suggests they understand this concern. But at launch, the harness is an AWS product for AWS customers.

What does this mean for your agent architecture?

The harness validates something the 15-jobs framework argued but could not prove: the agent harness isn’t a framework decision. It’s an infrastructure decision.

Frameworks (LangChain, LangGraph, CrewAI) sell you a programming model. You write code inside their abstractions, and you get an agent. The harness sells you infrastructure. You declare what you want, and you get an agent. The programming model is the CLI and the API, not Python imports.

For teams that are building agents on AWS, the harness eliminates the months of plumbing that stood between an agent prototype and a production deployment. You still need to design the agent: which model, which tools, which skills, which policies. But you don’t need to build the infrastructure that runs it.

For teams that aren’t on AWS, the harness signals where the industry is heading. Cloud vendors are commoditizing the agent harness layer just as they commoditized compute, storage, and databases before it. Every major cloud provider will ship a managed harness within the next year. The question isn’t whether you use AWS AgentCore. It’s whether your architecture works for a world where the harness is infrastructure, not code.

FAQ

Is the harness free to use? Yes. There’s no separate harness fee. You pay for the underlying AgentCore resources your agent consumes: Runtime compute, Memory, Gateway, Browser, Code Interpreter, and model inference: each at its own consumption-based rate.

Can I use models outside AWS Bedrock? Yes. The harness supports four model provider types: bedrock (Bedrock-hosted models), openAi (direct OpenAI API), gemini (Google Gemini), and liteLlm (any provider LiteLLM supports, including Anthropic direct, Cohere, Mistral, Vertex, and Azure OpenAI).

Does the harness work with my existing LangGraph agent? Not directly. The harness is a standalone managed service, not a runtime for framework-based agents. However, you can export a harness setup to Strands-based code, and you can run LangGraph agents on AgentCore Runtime (the underlying compute layer) separately from the harness.

Can I use the harness for multi-agent orchestration? The harness manages one agent. For multi-agent workflows, you can chain invocations through Step Functions integration. True multi-agent coordination (delegation, negotiation, shared state) requires custom orchestration on top.

What happens if AWS changes the harness API? Every UpdateHarness creates an immutable version. Named endpoints (PROD, STAGING) stay pinned until you explicitly promote them. API changes don’t affect running agents. You control when to adopt new versions.


This article was published on Agentic Up (https://agenticup.dev): practical guides for developers and founders building with AI agents. Reach me at [email protected]

Newsletter

Get the brief on AI agents

Practical posts on shipping agents, automating work, and building in public. No hype, no fluff.

Contact: [email protected]