📌 MAROKO133 Breaking ai: Moonshot's Kimi K2 Thinking emerges as leading open
Even as concern and skepticism grows over U.S. AI startup OpenAI's buildout strategy and high spending commitments, Chinese open source AI providers are escalating their competition and one has even caught up to OpenAI's flagship, paid proprietary model GPT-5 in key third-party performance benchmarks with a new, free model.
The Chinese AI startup Moonshot AI’s new Kimi K2 Thinking model, released today, has vaulted past both proprietary and open-weight competitors to claim the top position in reasoning, coding, and agentic-tool benchmarks.
Despite being fully open-source, the model now outperforms OpenAI’s GPT-5, Anthropic’s Claude Sonnet 4.5 (Thinking mode), and xAI's Grok-4 on several standard evaluations — an inflection point for the competitiveness of open AI systems.
Developers can access the model via platform.moonshot.ai and kimi.com; weights and code are hosted on Hugging Face. The open release includes APIs for chat, reasoning, and multi-tool workflows.
Users can try out Kimi K2 Thinking directly through its own ChatGPT-like website competitor and on a Hugging Face space as well.
Modified Standard Open Source License
Moonshot AI has formally released Kimi K2 Thinking under a Modified MIT License on Hugging Face.
The license grants full commercial and derivative rights — meaning individual researchers and developers working on behalf of enterprise clients can access it freely and use it in commercial applications — but adds one restriction:
"If the software or any derivative product serves over 100 million monthly active users or generates over $20 million USD per month in revenue, the deployer must prominently display 'Kimi K2' on the product’s user interface."
For most research and enterprise applications, this clause functions as a light-touch attribution requirement while preserving the freedoms of standard MIT licensing.
It makes K2 Thinking one of the most permissively licensed frontier-class models currently available.
A New Benchmark Leader
Kimi K2 Thinking is a Mixture-of-Experts (MoE) model built around one trillion parameters, of which 32 billion activate per inference.
It combines long-horizon reasoning with structured tool use, executing up to 200–300 sequential tool calls without human intervention.
According to Moonshot’s published test results, K2 Thinking achieved:
-
44.9 % on Humanity’s Last Exam (HLE), a state-of-the-art score;
-
60.2 % on BrowseComp, an agentic web-search and reasoning test;
-
71.3 % on SWE-Bench Verified and 83.1 % on LiveCodeBench v6, key coding evaluations;
-
56.3 % on Seal-0, a benchmark for real-world information retrieval.
Across these tasks, K2 Thinking consistently outperforms GPT-5’s corresponding scores and surpasses the previous open-weight leader MiniMax-M2—released just weeks earlier by Chinese rival MiniMax AI.
Open Model Outperforms Proprietary Systems
GPT-5 and Claude Sonnet 4.5 Thinking remain the leading proprietary “thinking” models.
Yet in the same benchmark suite, K2 Thinking’s agentic reasoning scores exceed both: for instance, on BrowseComp the open model’s 60.2 % decisively leads GPT-5’s 54.9 % and Claude 4.5’s 24.1 %.
K2 Thinking also edges GPT-5 in GPQA Diamond (85.7 % vs 84.5 %) and matches it on mathematical reasoning tasks such as AIME 2025 and HMMT 2025.
Only in certain heavy-mode configurations—where GPT-5 aggregates multiple trajectories—does the proprietary model regain parity.
That Moonshot’s fully open-weight release can meet or exceed GPT-5’s scores marks a turning point. The gap between closed frontier systems and publicly available models has effectively collapsed for high-end reasoning and coding.
Surpassing MiniMax-M2: The Previous Open-Source Benchmark
When VentureBeat profiled MiniMax-M2 just a week and a half ago, it was hailed as the “new king of open-source LLMs,” achieving top scores among open-weight systems:
-
τ²-Bench 77.2
-
BrowseComp 44.0
-
FinSearchComp-global 65.5
-
SWE-Bench Verified 69.4
Those results placed MiniMax-M2 near GPT-5-level capability in agentic tool use. Yet Kimi K2 Thinking now eclipses them by wide margins.
Its BrowseComp result of 60.2 % exceeds M2’s 44.0 %, and its SWE-Bench Verified 71.3 % edges out M2’s 69.4 %. Even on financial-reasoning tasks such as FinSearchComp-T3 (47.4 %), K2 Thinking performs comparably while maintaining superior general-purpose reasoning.
Technically, both models adopt sparse Mixture-of-Experts architectures for compute efficiency, but Moonshot’s network activates more experts and deploys advanced quantization-aware training (INT4 QAT).
This design doubles inference speed relative to standard precision without degrading accuracy—critical for long “thinking-token” sessions reaching 256 k context windows.
Agentic Reasoning and Tool Use
K2 Thinking’s defining capability lies in its explicit reasoning trace. The model outputs an auxiliary field, reasoning_content, revealing intermediate logic before each final response. This transparency preserves coherence across long multi-turn tasks and multi-step tool calls.
A reference implementation published by Moonshot demonstrates how the model autonomously conducts a “daily news report” workflow: invoking date and web-search tools, analyzing retrieved content, and composing structured output—all while maintaining internal reasoning state.
This end-to-end autonomy enables the model to plan, search, execute, and synthesize evidence across hundreds of steps, mirroring the emerging class of “agentic AI” systems that operate with minimal supervision.
Efficiency and Access
Despite its trillion-parameter scale, K2 Thinking’s runtime cost remains modest. Moonshot lists usage at:
-
$0.15 / 1 M tokens (cache hit)
-
$0.60 / 1 M tokens (cache miss)
-
$2.50 / 1 M tokens output
These rates are competitive even against MiniMax-M2’s $0.30 input / $1.20 output pricing—and an order of magnitude below GPT-5 ($1.25 input / $10 output).
Comparative Context: Open-Weight Acceleration
The rapid succession of M2 and K2 Thinking illustrates how quickly open-source research is catching frontier systems. MiniMax-M2 demonstrated that open models could approach GPT-5-class agentic capability at a fraction of the compute cost. Moonshot has now advanced that frontier further, pushing open weights beyond parity into outright leadership.
Both models rely on sparse activation for efficiency, but K2 Thinking’s higher activation count (32 B vs 10 B active parameters) yields stronger r…
Konten dipersingkat otomatis.
đź”— Sumber: venturebeat.com
📌 MAROKO133 Breaking ai: Sam Altman Says That in a Few Years, a Whole Company Coul
OpenAI CEO Sam Altman says that an era when entire companies are run by AI models is nearly upon us. And if he has it his way, it’ll be OpenAI leading the charge, even if it means losing his job.
“Shame on me if OpenAI isn’t the first big company run by an AI CEO,” Altman said on an episode of the “Conversations with Tyler” podcast recorded last month and released Wednesday.
Asked how long it will be until a large division at the company is 85 percent run by AI or more, Altman offered a bold prediction.
“Some small single digit number of years — not very far,” he said.
But when the host predicts there will be billion dollar companies “run by two or three people with AIs” in two and a half years, Altman seems to move up the timeline.
“I think the AI can do it sooner than that,” he said.
It’s another classic big boast from Altman, who rarely shies away from sweeping pronouncements about the AI industry and how it will shape the world — both good and bad. He has frequently teased that OpenAI is on the verge of achieving artificial general intelligence, or AGI, a hypothetical AI system that surpasses human intelligence in virtually all aspects. He has written an entire manifesto detailing how AI will usher in a utopic future of “massive prosperity” for all.
His doomsaying is equally prolific: Altman also warns that AI will destroy entire categories of jobs, could cause a “fraud crisis,” implode the economy, or even end the world, if we’re not careful.
By Altman’s standards, predicting AI-run companies are right around the corner is pretty tame. One thing he sounds most certain about is AI far surpassing the performance of human CEOs, himself included. This “clearly will happen someday,” Altman said.
He isn’t entirely self-effacing, however. Altman concedes the host’s point that the public-facing role of the CEO is pretty important, which is undoubtedly true for OpenAI. In no small part thanks to Altman’s grand promises, it’s capitalized on its hype to garner hundreds of billions of dollars in valuation even as it continues to lose billions of dollars every quarter. Perhaps Altman could stay the public face, he imagined out loud, while an AI makes all the big decisions.
AI-controlled companies, in sum, are a foregone conclusion in Altman’s eyes. The main roadblock in the way of this future happening is people’s reluctance to trust in AI systems over humans, he said, “even if they shouldn’t.”
“It may take much longer for society to get really comfortable with this,” he added, “but on the actual decision-making for most things, maybe AI is pretty good pretty soon.”
Perhaps it’s not as far-fetched as it sounds. It’s not as if human CEOs have a stellar track record of being universally competent operators. And what better way to justify carrying out harsh and unpopular changes at a company than by saying it was decided by an impartial AI model? Guess we’ll have to wait “some small single digit number of years” to find out.
More on AI: OpenAI Exec Says It Could Use Some Financial Support From the Government
The post Sam Altman Says That in a Few Years, a Whole Company Could Be Run by AI, Including the CEO appeared first on Futurism.
đź”— Sumber: futurism.com
🤖 Catatan MAROKO133
Artikel ini adalah rangkuman otomatis dari beberapa sumber terpercaya. Kami pilih topik yang sedang tren agar kamu selalu update tanpa ketinggalan.
✅ Update berikutnya dalam 30 menit — tema random menanti!