---
title: How AI coding agents actually use your SDK
canonical: "https://agenticup.dev/posts/how-ai-coding-agents-use-your-sdk/"
pubDate: "2026-06-10T00:00:00.000Z"
description: "You ship an SDK. AI coding agents consume it differently than humans do. Here's the exact step-by-step trace of what happens between 'developer types a prompt' and 'agent generates code with your technology'. and how to design for it."
tags: [coding-agents, sdk-design, agent-tools, mcp, harness, claude-code, copilot]
---

TL;DR: Microsoft's DevBlogs team traced exactly how AI coding agents discover and use your SDK, API, CLI or MCP server. The key insight: your tool description competes for context window space, the model decides whether to use it or guess from training memory, and most teams don't test with real agents.

You ship an SDK, a CLI, or an API. Developers use it. Now AI coding agents use it too, but they use it differently than humans do.

I've built and used enough coding agents (Claude Code, Copilot, Cursor, OpenCode, Hermes) to know that most teams have no idea what happens between a developer typing a prompt and the agent generating code with their technology. Is the agent reading your docs? Calling your MCP server? Ignoring both and guessing from training data?

Microsoft's Developer Advocacy team recently published a detailed trace of this exact flow: how AI coding agents consume developer tools. It's one of the most practical pieces I've seen on the topic, so I'm breaking it down here with my own commentary from the agent-building side.

> **Key takeaways:**
> - AI coding agents discover your SDK through context assembly: your tool description competes with everything else in the context window
> - The model reads tool descriptions, open files, and workspace config all at once, then decides whether to call your tool or generate code from training memory
> - MCP servers that return live schema information beat documentation that the model may or may not have seen during pre-training
> - Keep tool descriptions short and focused: harnesses have per-description token limits and will truncate or drop verbose ones
> - Test your tooling with real agents, not just curl: the gap between "works in a demo" and "works with Claude Code" is real

## The full trace

The Microsoft article traces seven steps from "developer types a prompt" to "agent emits code." Here's the condensed version:

### Step 1: The harness assembles context

Before the model sees anything, the harness (Copilot, Claude Code, Cursor) builds the context window. It pulls together:

- **Open editor tabs**: every file the developer has open
- **Workspace structure**: folder hierarchy, config files, dependency manifests
- **Terminal history**: recent commands and output
- **Tool definitions**: every MCP server, every CLI, every extension
- **Environment details**. OS, working directory path, language runtime versions

The harness has a token budget. If you have 20 extensions installed, the harness might summarize tool descriptions, drop some entirely, or rank them by estimated relevance. Your SDK's tool description is competing for space before the model even sees it.

**What this means for you:** If your MCP server or CLI tool description exceeds the harness's description limit (each harness sets its own), it gets ignored entirely. Keep descriptions tight: a sentence on what the tool does, a sentence on when to call it.

### Step 2: The model reads the room

The model receives the assembled context and reads it *all at once*: the system prompt, tool descriptions, workspace context, and the developer's prompt.

This is where training data matters. If the model has seen your technology during pre-training, it already has opinions about your API patterns, SDK conventions, and common error messages. If it hasn't, it has nothing, and it'll either ask for help or guess based on similar technologies.

**What this means for you:** The model's pre-training cutoff date determines whether it knows about your latest API version. If you shipped v2 after the cutoff, the agent will keep generating v1 code unless you ground it with explicit documentation or tools.

### Step 3: Tool selection

The model decides whether it needs to call a tool or can generate code from knowledge. This is a cost-benefit calculation embedded in the model's training: *calling a tool costs a round trip and tokens, but produces reliable information. Generating from memory is free but may be stale or wrong.*

Models generally prefer to use tools when they're confident the tool will return useful information, and fall back to parametric knowledge when tool descriptions are unclear or the context window is too full.

**What this means for you:** MCP servers and CLI tools that provide *verifiable* information (current API versions, actual schema shapes, live error messages) will be preferred over documentation that requires the model to parse and retain details.

### Step 4: Tool invocation

If the model decides to use a tool, it emits a tool call in the format the harness expects. The harness executes it, captures the output, and feeds it back into the context window.

This is the most expensive step in tokens: the tool call itself, the output, and the model's next read all consume context. A chatty tool that returns unnecessary metadata will eat into the budget the model needs for actual code generation.

### Step 5-7: Code generation and iteration

With the tool output back in context, the model generates code. Then the developer reviews it, asks for changes, and the cycle repeats.

## Designing for agent consumption

The full Microsoft post goes deeper into each step, but the practical takeaways for anyone shipping developer tools are:

**Write tool descriptions for agents, not humans.** A human can handle "This command configures authentication for the Contoso Identity platform, supporting OAuth 2.0, OpenID Connect, and SAML 2.0 with configurable token lifetimes and custom claim mapping." An agent needs "Configures authentication. Run this before any auth-dependent operations." The detail belongs in the tool's output, not its description.

**Ground your API in something the agent can read.** An MCP server that returns live schema information beats documentation that the model may or may not have seen during training. If you can't build an MCP server, at minimum make sure your README and getting-started docs are parseable by an agent: clear code examples, explicit error messages, structured type definitions.

**Test your tooling with actual agents.** Run Claude Code or Copilot against your MCP server and see what happens. Does the agent understand when to call your tool? Does the output fit in a reasonable context window? Does the agent generate correct code on the first try, or does it need multiple iterations?

I've seen too many teams ship MCP servers that look great on paper but fail the first time an agent talks to them. The difference between a tool that works in a demo and one that works in production is testing it against real agent behavior: not just curl commands.

The full Microsoft post is worth reading if you ship developer tools:
[https://developer.microsoft.com/blog/how-ai-coding-agents--use-your-technology](https://developer.microsoft.com/blog/how-ai-coding-agents--use-your-technology)

I've been writing about [how agent harnesses work](/posts/agent-harness-15-jobs/) and [what makes a good agent tool](/posts/best-ai-agent-frameworks-2026/): this Microsoft post is a great companion piece that traces the exact mechanics from the tool provider's perspective.

Also see: [Your AI Agent Just Scaffolded a Project from 2020](/posts/ai-agent-silent-version-drift/): the version pinning problem that surfaces when agents use `npx` without version constraints.

## Related Posts

- [Is your agent extension working?](/posts/ai-agent-extension-evaluation/). How to measure whether your MCP server or agent extension produces real lift vs drag
- [Best MCP servers in 2026](/posts/best-mcp-servers-2026/). A curated list of the most useful Model Context Protocol servers for AI coding agents
- [Best AI coding agents in 2026](/posts/best-ai-coding-agents-2026/). Comparing Claude Code, Cursor, Copilot, and OpenCode for development workflows
- [Cursor vs Claude Code vs Copilot](/posts/cursor-vs-claude-code-vs-copilot-comparison/). A six-month comparison of the three major AI coding tools on real development tasks



A [2026 guide on AI coding agents](https://medium.com/@dave-patten/the-state-of-ai-coding-agents-2026-from-pair-programming-to-autonomous-ai-teams-b11f2b39232a) covers the full landscape from pair programming to autonomous teams. The [AI Agents subreddit](https://www.reddit.com/r/AI_Agents/comments/1rdf5v7/my_guide_on_what_tools_to_use_to_build_ai_agents/) discusses which tools agents use and how SDKs are consumed.

---

This article was published on Agentic Up (https://agenticup.dev): practical guides for developers and founders building with AI agents. Reach me at hello@agenticup.dev.
