March 2026 · Chinese AI Frontier Comparison

M2.7versusK2.5

MiniMax's self-evolving agent engine vs Moonshot's visual swarm intelligence. A head-to-head for builders choosing their next agentic backbone.

Scroll
01

Identity

MiniMax M2.7
MiniMax · Beijing · Released Mar 18, 2026
ArchitectureProprietary (dense, ~10B active)
Context Window205K tokens
ModalityText → Text
Open WeightsNo (proprietary)
Input Price$0.30 / 1M tokens
Output Price$1.20 / 1M tokens
Speed~48 tok/s (+ highspeed variant)
Key InnovationSelf-evolving agent harness
Kimi K2.5
Moonshot AI · Beijing · Released Jan 27, 2026
ArchitectureMoE · 1T total / 32B active
Context Window256K tokens
ModalityText + Image + Video → Text
Open WeightsYes (Modified MIT)
Input Price$0.60 / 1M tokens
Output Price$2.50–3.00 / 1M tokens
Speed~46 tok/s
Key InnovationAgent Swarm (100 sub-agents)
02

Benchmarks

BenchmarkM2.7K2.5Verdict
SWE-ProHard real-world SW engineering56.2%M2.7
SWE-Bench VerifiedGitHub issue resolution~55%*76.8%K2.5
LiveCodeBenchCompetitive programming85.0%K2.5
Terminal-Bench 2Complex engineering systems57.0%M2.7
VIBE-ProEnd-to-end project delivery55.6%M2.7
HLE (with tools)Humanity's Last Exam50.2%K2.5
BrowseCompWeb navigation & search78.4%K2.5
GDPval-AAOffice productivity ELO1495M2.7
Skill AdherenceComplex skill following (40+ skills)97%M2.7
MMMU ProMultimodal academic understandingN/A (text only)78.5%K2.5
MathVisionVisual math reasoningN/A84.2%K2.5
AA Intelligence IndexArtificial Analysis composite5047M2.7

* M2.7 is text-only and not benchmarked on many vision/agentic tasks that K2.5 excels at. "—" means no published score. Scores from official reports and Artificial Analysis.

03

Pricing

$0.30MiniMax M2.7
$0.60Kimi K2.5
$1.20MiniMax M2.7
$2.50–3.00Kimi K2.5
~$0.53MiniMax M2.7 ← 2× cheaper
~$1.20Kimi K2.5
04

Agentic Architecture

Agent Paradigm
Fundamentally different philosophies

M2.7 focuses on self-evolving harness engineering — the model refines its own scaffolding, memory, skills, and tool-selection loops across 100+ autonomous improvement cycles. It's a single powerful agent that gets better at using its own environment.

K2.5 focuses on swarm coordination — spawning up to 100 parallel sub-agents, each with independent tool access, to decompose and conquer complex tasks. It's a coordinator that scales horizontally.

Tool Use & Skill Adherence
M2.7 wins on reliability

M2.7's 97% adherence on 40+ complex skills (>2000 tokens each) is the headline number. It was literally trained to build and optimize its own agent harness. For harness engineering — the layer you care most about — M2.7 is purpose-built.

K2.5 handles 200–300 sequential tool calls without drift, which is impressive for long-horizon tasks, but doesn't report comparable skill-following metrics.

Parallelism & Throughput
K2.5 wins on scale

Agent Swarm's 100 parallel sub-agents with 1,500 coordinated steps delivers 4.5× speedup on parallelizable tasks. For research, batch processing, and multi-source analysis, this is a game-changer.

M2.7 operates as a single-threaded agent loop. Fast per-token (100 tok/s with highspeed), but fundamentally serial.

Self-Improvement
M2.7 is unique here

M2.7 is the first model to demonstrably participate in its own training loop — updating memory, building skills, running RL experiments, and refining its own harness. 30–50% of its development workflow was self-directed.

K2.5 doesn't claim self-improvement capabilities. Its swarm agents are disposable and stateless.

Multimodality
K2.5 wins, no contest

K2.5 has native vision (MoonViT, 400M params) trained on 15T mixed visual+text tokens. It processes images, video, PDFs, and does vision-grounded coding (Figma → React).

M2.7 is text-only. No image input, no video understanding. If your agents need to see, M2.7 can't help.

Open Source & Self-hosting
K2.5 wins on sovereignty

K2.5 is fully open-weight under Modified MIT. Deploy on your own infra with vLLM/SGLang. Full data sovereignty. Commercial use free under 100M MAU / $20M MRR.

M2.7 is proprietary API-only. You're locked into MiniMax's infrastructure. No self-hosting, no weight access.

05

Strengths & Weaknesses

MiniMax M2.7

Strengths

  • Insane cost efficiency — 50× cheaper than Opus on input, near frontier on SWE-Pro
  • 97% skill adherence makes it the most reliable harness backend available
  • Self-evolving architecture — model improves its own scaffolding
  • Best office productivity (ELO 1495) — Excel, PPT, Word editing
  • Highspeed variant (~100 tok/s) for latency-sensitive agent loops
  • Compatible with Claude Code, Cursor, Kilo Code, Roo Code as a backend

Weaknesses

  • Text-only — zero vision capability, can't process images/video
  • Proprietary & closed — no self-hosting, API lock-in
  • 205K context is smaller than K2.5's 256K
  • Very verbose output (~87M tokens on AA eval) — burns tokens
  • Chinese censorship on politically sensitive topics
  • Standard mode is slower at 48 tok/s

Kimi K2.5

Strengths

  • Native multimodal — image, video, PDF input via MoonViT
  • Agent Swarm: 100 parallel sub-agents, 4.5× speedup on batch tasks
  • Best open-source coding model (76.8% SWE-Bench, 85% LiveCodeBench)
  • Open weights (Modified MIT) — full self-hosting with vLLM/SGLang
  • Dominant on web navigation (BrowseComp 78.4%, beat GPT-5.2)
  • 4 operational modes: Instant, Thinking, Agent, Agent Swarm
  • Vision-to-code: Figma mockup → React/Vue components

Weaknesses

  • 2× more expensive than M2.7 on blended token cost
  • Slow median response time (29.2s vs 4.6s competitors)
  • 1T params means self-hosting requires serious GPU infra
  • English prose quality rated ~8.5/10 vs 9/10 for GPT
  • Chinese censorship on political content
  • Moonshot accused by Anthropic of training data scraping (Feb 2026)
  • Weaker ecosystem/community presence in Western markets
06

Verdict for AI Agent Builders

The Bottom Line

Pick M2.7 When

You need a cheap, reliable harness backend that follows complex skill instructions faithfully. Ideal for coding agents, office automation pipelines, agent orchestration where the model is a cog in your harness — not the orchestrator itself. At $0.30/$1.20 per 1M tokens with near-Opus coding quality, it's the best bang-for-buck reasoning engine for serial agentic workflows. The self-evolving harness pattern is genuinely novel and aligns with the "harness is the moat" philosophy.

Pick K2.5 When

You need multimodal perception + parallel execution. If your agents need to see (screenshots, documents, video), K2.5 is the only choice here. Agent Swarm unlocks massively parallel research, web crawling, and batch processing that a single-agent loop simply can't match. Open weights mean full data sovereignty for enterprise deployments. The vision-to-code pipeline (Figma → frontend) is production-ready. Best fit for autonomous research agents and multi-source analysis.