What agent harnesses support North Mini Code?

It was trained using multiple scaffolds (not improved for a single one) and is designed to work with OpenCode, Claude Code, and other coding agent harnesses.

Cohere North Mini Code: a 30B MoE model for agentic coding

Cohere released North Mini Code. a 30B MoE model with 3B active parameters trained for agentic coding. Scores 33.4 on the Coding Index, beating models 4x its size.

TL;DR: I tested Cohere’s new coding model expecting a compromise. 30B total, 3B active parameters. It beat Qwen 3.5 and Gemma 4 on agentic coding benchmarks. And it runs on a consumer GPU.

Most coding models are evaluated on static benchmarks: write a function, pass the tests. North Mini Code was trained on agentic coding tasks: edit codebases, run terminal commands, fix bugs across multiple files.

Key takeaways:

North Mini Code is a 30B MoE model with 3B active parameters, Apache 2.0 licensed on Hugging Face

Scores 33.4 on Artificial Analysis Coding Index, beating Qwen 3.5, Gemma 4, and models 4x its size

Trained with multi-scaffold RLVR for agentic coding, not just text generation

128 experts with 8 active per token: inference cost close to a 3B dense model

Available now in OpenCode and via Hugging Face

How does Cohere North Mini Code architecture work?

North Mini Code is a decoder-only Transformer with a sparse Mixture-of-Experts architecture. The 30B total parameters are distributed across 128 experts, with only 8 activated per token. This means the inference cost is closer to a 3B dense model than a 30B one.

The attention mechanism uses interleaved sliding-window attention (with RoPE) and global attention (without positional embeddings) in a 3:1 ratio. Three sliding-window layers for every global attention layer. This keeps the context processing efficient while maintaining the model’s ability to reason across long codebases.

The feed-forward block uses SwiGLU activation, and the router applies a sigmoid activation to logits before top-k selection: a detail that matters for training stability in sparse MoE models.

How was Cohere North Mini Code trained?

The training pipeline is where the “agentic” part happens. After pre-training, Cohere ran two phases of supervised fine-tuning followed by a phase of reinforcement learning with verifiable rewards (RLVR) targeting software engineering and terminal tasks.

The key insight: they used multiple agent scaffolds during RL training: not a single harness. This prevents the model from overfitting to one tool’s quirks and makes it useful across different coding agents, whether it’s OpenCode, Claude Code, or a custom harness you built yourself.

How does North Mini Code perform on benchmarks?

On Artificial Analysis’ Coding Index, North Mini Code scores 33.4, outperforming:

Model	Size	Score
North Mini Code	30B-A3B MoE	33.4
Qwen 3.5	35B-A3B MoE	Lower
Gemma 4	26B-A4B MoE	Lower
Devstral Small 2	24B Dense	Lower
Nemotron 3 Super	120B-A12B MoE	Lower
Mistral Small 4	119B-A6B MoE	Lower

The fact that a 30B MoE model with only 3B active parameters beats models 4x its size on agentic coding tasks is worth paying attention to.

What this means for AI engineering

Three takeaways:

1. The “agentic coding” benchmark gap is closing. Until recently, proprietary models dominated SWE-Bench and similar agentic benchmarks. Open-weight models catching up means you can run capable coding agents without depending on a single API provider. This follows the same trajectory I wrote about in my comparison of AI coding tools: the open-source ecosystem is closing the gap faster than most expected.

2. MoE makes local inference practical. 3B active parameters is roughly what you’d need for a small dense model. The inference cost scales with active parameters, not total parameters. This makes North Mini Code viable on consumer GPUs and potentially even laptops for simpler agentic tasks: relevant to the local-first agent pattern I’ve been building around.

3. Multi-scaffold training is the right approach. A model trained on one agent harness develops quirks specific to that harness. Training across multiple scaffolds generalizes better: a lesson for anyone building or fine-tuning coding agents.

Cohere positions North Mini Code as the first model in a new family, with more sizes likely on the way. For now, it’s available on Hugging Face under Apache 2.0, and you can try it in OpenCode today.

Try it yourself

Pull the model from Hugging Face and run it with your agent harness of choice. The multi-scaffold training means it should work well with OpenCode, Claude Code, or a custom setup. Let me know what you find. I'm curious how it handles real-world agentic tasks beyond the benchmarks.

If you’ve been waiting for an open-weight coding model that treats agentic workflows as a first-class concern rather than an afterthought, this is worth a look.

FAQ

What is Cohere North Mini Code? A 30B-parameter Mixture-of-Experts coding model with 3B active parameters, trained specifically for agentic coding tasks like code generation, debugging, and terminal-based development workflows. It’s available under Apache 2.0 license on Hugging Face.

How does North Mini Code compare to Qwen 3.5 and Gemma 4 for coding? North Mini Code scores 33.4 on Artificial Analysis’ Coding Index, outperforming Qwen 3.5 (35B-A3B), Gemma 4 (26B-A4B), and even larger models like Nemotron 3 Super (120B-A12B) and Mistral Small 4 (119B-A6B) in agentic coding benchmarks.

Can I run North Mini Code locally? With only 3B active parameters out of 30B total, the inference cost is closer to a 3B model. This means it can run on consumer GPUs and even some high-end laptops, though the full 30B checkpoint requires more memory.

Is North Mini Code open source? Yes, it’s released under the Apache 2.0 license on Hugging Face, making it free for both research and commercial use.

Best open source LLMs for coding in 2026. Comparing DeepSeek, Qwen, Llama, and other open-weight coding models
Best AI coding agents in 2026. Comparing Claude Code, Cursor, Copilot, and OpenCode for development workflows
Making FlashAttention-4 faster for inference. GPU-level optimizations that benefit model inference performance

This article was published on Agentic Up (https://agenticup.dev): practical guides for developers and founders building with AI agents. Reach me at [email protected]

Cohere North Mini Code: a 30B MoE model for agentic coding

How does Cohere North Mini Code architecture work?

How was Cohere North Mini Code trained?

How does North Mini Code perform on benchmarks?

What this means for AI engineering

FAQ

Related Posts

Get the brief on AI agents