MiniMax M2.7 vs Kimi K2.5 — The Chinese AI Model Showdown

Identity

MiniMax M2.7

MiniMax · Beijing · Released Mar 18, 2026

ArchitectureProprietary (dense, ~10B active)

Context Window205K tokens

ModalityText → Text

Open WeightsNo (proprietary)

Input Price$0.30 / 1M tokens

Output Price$1.20 / 1M tokens

Speed~48 tok/s (+ highspeed variant)

Key InnovationSelf-evolving agent harness

Kimi K2.5

Moonshot AI · Beijing · Released Jan 27, 2026

ArchitectureMoE · 1T total / 32B active

Context Window256K tokens

ModalityText + Image + Video → Text

Open WeightsYes (Modified MIT)

Input Price$0.60 / 1M tokens

Output Price$2.50–3.00 / 1M tokens

Speed~46 tok/s

Key InnovationAgent Swarm (100 sub-agents)

Benchmarks

Benchmark	M2.7	K2.5	Verdict
SWE-ProHard real-world SW engineering	56.2%	—	M2.7
SWE-Bench VerifiedGitHub issue resolution	~55%*	76.8%	K2.5
LiveCodeBenchCompetitive programming	—	85.0%	K2.5
Terminal-Bench 2Complex engineering systems	57.0%	—	M2.7
VIBE-ProEnd-to-end project delivery	55.6%	—	M2.7
HLE (with tools)Humanity's Last Exam	—	50.2%	K2.5
BrowseCompWeb navigation & search	—	78.4%	K2.5
GDPval-AAOffice productivity ELO	1495	—	M2.7
Skill AdherenceComplex skill following (40+ skills)	97%	—	M2.7
MMMU ProMultimodal academic understanding	N/A (text only)	78.5%	K2.5
MathVisionVisual math reasoning	N/A	84.2%	K2.5
AA Intelligence IndexArtificial Analysis composite	50	47	M2.7

* M2.7 is text-only and not benchmarked on many vision/agentic tasks that K2.5 excels at. "—" means no published score. Scores from official reports and Artificial Analysis.

Pricing

Input Price ($ per 1M tokens)

$0.30MiniMax M2.7

$0.60Kimi K2.5

Output Price ($ per 1M tokens)

$1.20MiniMax M2.7

$2.50–3.00Kimi K2.5

Effective Blended Cost (3:1 in/out)

~$0.53MiniMax M2.7 ← 2× cheaper

~$1.20Kimi K2.5

Agentic Architecture

Agent Paradigm

Fundamentally different philosophies

M2.7 focuses on self-evolving harness engineering — the model refines its own scaffolding, memory, skills, and tool-selection loops across 100+ autonomous improvement cycles. It's a single powerful agent that gets better at using its own environment.

K2.5 focuses on swarm coordination — spawning up to 100 parallel sub-agents, each with independent tool access, to decompose and conquer complex tasks. It's a coordinator that scales horizontally.

Tool Use & Skill Adherence

M2.7 wins on reliability

M2.7's 97% adherence on 40+ complex skills (>2000 tokens each) is the headline number. It was literally trained to build and optimize its own agent harness. For harness engineering — the layer you care most about — M2.7 is purpose-built.

K2.5 handles 200–300 sequential tool calls without drift, which is impressive for long-horizon tasks, but doesn't report comparable skill-following metrics.

Parallelism & Throughput

K2.5 wins on scale

Agent Swarm's 100 parallel sub-agents with 1,500 coordinated steps delivers 4.5× speedup on parallelizable tasks. For research, batch processing, and multi-source analysis, this is a game-changer.

M2.7 operates as a single-threaded agent loop. Fast per-token (100 tok/s with highspeed), but fundamentally serial.

Self-Improvement

M2.7 is unique here

M2.7 is the first model to demonstrably participate in its own training loop — updating memory, building skills, running RL experiments, and refining its own harness. 30–50% of its development workflow was self-directed.

K2.5 doesn't claim self-improvement capabilities. Its swarm agents are disposable and stateless.

Multimodality

K2.5 wins, no contest

K2.5 has native vision (MoonViT, 400M params) trained on 15T mixed visual+text tokens. It processes images, video, PDFs, and does vision-grounded coding (Figma → React).

M2.7 is text-only. No image input, no video understanding. If your agents need to see, M2.7 can't help.

Open Source & Self-hosting

K2.5 wins on sovereignty

K2.5 is fully open-weight under Modified MIT. Deploy on your own infra with vLLM/SGLang. Full data sovereignty. Commercial use free under 100M MAU / $20M MRR.

M2.7 is proprietary API-only. You're locked into MiniMax's infrastructure. No self-hosting, no weight access.

Strengths & Weaknesses

MiniMax M2.7

Strengths

Insane cost efficiency — 50× cheaper than Opus on input, near frontier on SWE-Pro
97% skill adherence makes it the most reliable harness backend available
Self-evolving architecture — model improves its own scaffolding
Best office productivity (ELO 1495) — Excel, PPT, Word editing
Highspeed variant (~100 tok/s) for latency-sensitive agent loops
Compatible with Claude Code, Cursor, Kilo Code, Roo Code as a backend

Weaknesses

Text-only — zero vision capability, can't process images/video
Proprietary & closed — no self-hosting, API lock-in
205K context is smaller than K2.5's 256K
Very verbose output (~87M tokens on AA eval) — burns tokens
Chinese censorship on politically sensitive topics
Standard mode is slower at 48 tok/s

Kimi K2.5

Strengths

Native multimodal — image, video, PDF input via MoonViT
Agent Swarm: 100 parallel sub-agents, 4.5× speedup on batch tasks
Best open-source coding model (76.8% SWE-Bench, 85% LiveCodeBench)
Open weights (Modified MIT) — full self-hosting with vLLM/SGLang
Dominant on web navigation (BrowseComp 78.4%, beat GPT-5.2)
4 operational modes: Instant, Thinking, Agent, Agent Swarm
Vision-to-code: Figma mockup → React/Vue components

Weaknesses

2× more expensive than M2.7 on blended token cost
Slow median response time (29.2s vs 4.6s competitors)
1T params means self-hosting requires serious GPU infra
English prose quality rated ~8.5/10 vs 9/10 for GPT
Chinese censorship on political content
Moonshot accused by Anthropic of training data scraping (Feb 2026)
Weaker ecosystem/community presence in Western markets

Verdict for AI Agent Builders

The Bottom Line

Pick M2.7 When

You need a cheap, reliable harness backend that follows complex skill instructions faithfully. Ideal for coding agents, office automation pipelines, agent orchestration where the model is a cog in your harness — not the orchestrator itself. At $0.30/$1.20 per 1M tokens with near-Opus coding quality, it's the best bang-for-buck reasoning engine for serial agentic workflows. The self-evolving harness pattern is genuinely novel and aligns with the "harness is the moat" philosophy.

Pick K2.5 When

You need multimodal perception + parallel execution. If your agents need to see (screenshots, documents, video), K2.5 is the only choice here. Agent Swarm unlocks massively parallel research, web crawling, and batch processing that a single-agent loop simply can't match. Open weights mean full data sovereignty for enterprise deployments. The vision-to-code pipeline (Figma → frontend) is production-ready. Best fit for autonomous research agents and multi-source analysis.

M2.7versusK2.5

Identity

Benchmarks

Pricing

Agentic Architecture

Strengths & Weaknesses

MiniMax M2.7

Strengths

Weaknesses

Kimi K2.5

Strengths

Weaknesses

Verdict for AI Agent Builders

The Bottom Line

Pick M2.7 When

Pick K2.5 When