MAROKO133 Hot ai: AI’s capacity crunch: Latency risk, escalating costs, and the coming sur

📌 MAROKO133 Update ai: AI’s capacity crunch: Latency risk, escalating costs, and t

The latest big headline in AI isn’t model size or multimodality — it’s the capacity crunch. At VentureBeat’s latest AI Impact stop in NYC, Val Bercovici, chief AI officer at WEKA, joined Matt Marshall, VentureBeat CEO, to discuss what it really takes to scale AI amid rising latency, cloud lock-in, and runaway costs.

Those forces, Bercovici argued, are pushing AI toward its own version of surge pricing. Uber famously introduced surge pricing, bringing real-time market rates to ridesharing for the first time. Now, Bercovici argued, AI is headed toward the same economic reckoning — especially for inference — when the focus turns to profitability.

"We don't have real market rates today. We have subsidized rates. That’s been necessary to enable a lot of the innovation that’s been happening, but sooner or later — considering the trillions of dollars of capex we’re talking about right now, and the finite energy opex — real market rates are going to appear; perhaps next year, certainly by 2027," he said. "When they do, it will fundamentally change this industry and drive an even deeper, keener focus on efficiency."

The economics of the token explosion

"The first rule is that this is an industry where more is more. More tokens equal exponentially more business value," Bercovici said.

But so far, no one's figured out how to make that sustainable. The classic business triad — cost, quality, and speed — translates in AI to latency, cost, and accuracy (especially in output tokens). And accuracy is non-negotiable. That holds not only for consumer interactions with agents like ChatGPT, but for high-stakes use cases such as drug discovery and business workflows in heavily regulated industries like financial services and healthcare.

"That’s non-negotiable," Bercovici said. "You have to have a high amount of tokens for high inference accuracy, especially when you add security into the mix, guardrail models, and quality models. Then you’re trading off latency and cost. That’s where you have some flexibility. If you can tolerate high latency, and sometimes you can for consumer use cases, then you can have lower cost, with free tiers and low cost-plus tiers."

However, latency is a critical bottleneck for AI agents. “These agents now don't operate in any singular sense. You either have an agent swarm or no agentic activity at all,” Bercovici noted.

In a swarm, groups of agents work in parallel to complete a larger objective. An orchestrator agent — the smartest model — sits at the center, determining subtasks and key requirements: architecture choices, cloud vs. on-prem execution, performance constraints, and security considerations. The swarm then executes all subtasks, effectively spinning up numerous concurrent inference users in parallel sessions. Finally, evaluator models judge whether the overall task was successfully completed.

“These swarms go through what's called multiple turns, hundreds if not thousands of prompts and responses until the swarm convenes on an answer,” Bercovici said.

“And if you have a compound delay in those thousand turns, it becomes untenable. So latency is really, really important. And that means typically having to pay a high price today that's subsidized, and that's what's going to have to come down over time.”

Reinforcement learning as the new paradigm

Until around May of this year, agents weren't that performant, Bercovici explained. And then context windows became large enough, and GPUs available enough, to support agents that could complete advanced tasks, like writing reliable software. It's now estimated that in some cases, 90% of software is generated by coding agents. Now that agents have essentially come of age, Bercovici noted, reinforcement learning is the new conversation among data scientists at some of the leading labs, like OpenAI, Anthropic, and Gemini, who view it as a critical path forward in AI innovation..

"The current AI season is reinforcement learning. It blends many of the elements of training and inference into one unified workflow,” Bercovici said. “It’s the latest and greatest scaling law to this mythical milestone we’re all trying to reach called AGI — artificial general intelligence,” he added. "What’s fascinating to me is that you have to apply all the best practices of how you train models, plus all the best practices of how you infer models, to be able to iterate these thousands of reinforcement learning loops and advance the whole field."

The path to AI profitability

There’s no one answer when it comes to building an infrastructure foundation to make AI profitable, Bercovici said, since it's still an emerging field. There’s no cookie-cutter approach. Going all on-prem may be the right choice for some — especially frontier model builders — while being cloud-native or running in a hybrid environment may be a better path for organizations looking to innovate agilely and responsively. Regardless of which path they choose initially, organizations will need to adapt their AI infrastructure strategy as their business needs evolve.

"Unit economics are what fundamentally matter here," said Bercovici. "We are definitely in a boom, or even in a bubble, you could say, in some cases, since the underlying AI economics are being subsidized. But that doesn’t mean that if tokens get more expensive, you’ll stop using them. You’ll just get very fine-grained in terms of how you use them."

Leaders should focus less on individual token pricing and more on transaction-level economics, where efficiency and impact become visible, Bercovici concludes.

The pivotal question enterprises and AI companies should be asking, Bercovici said, is “What is the real cost for my unit economics?”

Viewed through that lens, the path forward isn’t about doing less with AI — it’s about doing it smarter and more efficiently at scale.

🔗 Sumber: venturebeat.com

📌 MAROKO133 Update ai: Google Cloud updates its AI Agent Builder with new observab

Google Cloud has introduced a big update in a bid to keep AI developers on its Vertex AI platform for concepting, designing, building, testing, deploying and modifying AI agents in enterprise use cases.

The new features, announced today, include additional governance tools for enterprises and expanding the capabilities for creating agents with just a few lines of code, moving faster with state-of-the-art context management layers and one-click deployment, as well as managed services for scaling production and evaluation, and support for identifying agents.

Agent Builder, released last year during its annual Cloud Next event, provides a no-code platform for enterprises to create agents and connect these to orchestration frameworks like LangChain.

Google’s Agent Development Kit (ADK), which lets developers build agents “in under 100 lines of code,” can also be accessed through Agent Builder.

“These new capabilities underscore our commitment to Agent Builder, and simplify the agent development process to meet developers where they are, no matter which tech stack they choose,” said Mike Clark, director of Product Management, Vertex AI Agent Builder.

Build agents faster

Part of Google’s pitch for Agent Builder’s new features is that enterprises can bake in-orchestration even as they construct their agents.

“Building an agent from a concept to a working product involves complex orchestration,” said Clark.

The new capabilities, which are shipped with the ADK, include:

SOTA context management layers including Static, Turn, User and Cache layers so enterprises have more control over the agents’ context
Prebuilt plugins with customizable logic. One of the new plugins allows agents to recognize failed tool calls and “self-heal” by retrying the task with a different approach
Additional language support in ADK, including Go, alongside Python and Java, that launched with ADK
One-click deployment through the ADK command line interface to move agents from a local environment to live testing with a single command

Governance layer

Enterprises require high accuracy; security; observability and auditability (what a program did and why); and steerability (control) in their production-grade AI agents.

While Google had observability features in the local development environment at launch, developers can now access these tools through the Agent Engine managed runtime dashboard.

The company said this brings cloud-based production monitoring to track token consumption, error rates and latency. Within this observability dashboard, enterprises can visualize the actions agents take and reproduce any issues.

Agent Engine will also have a new Evaluation Layer to help “simulate agent performance across a vast array of user interactions and situations.”

This governance layer will also include:

Agent Identities that Google said give “agents their own unique, native identities within Google Cloud
Model Armor, which would block prompt injections, screen tool calls and agent responses
Security Command Center, so admins can build an inventory of their agents to detect threats like unauthorized access

“These native identities provide a deep, built-in layer of control and a clear audit trail for all agent actions. These certificate-backed identities further strengthen your security as they cannot be impersonated and are tied directly to the agent's lifecycle, eliminating the risk of dormant accounts,” Clark said.

The battle of agent builders

It’s no surprise that model providers create platforms to build agents and bring them to production. The competition lies in how fast new tools and features are added.

Google’s Agent Builder competes with OpenAI’s open-source Agent Development Kit, which enables developers to create AI agents using non-OpenAI models.

Additionally, there is the recently announced AgentKit, which features an Agent Builder that enables companies to integrate agents into their applications easily.

Microsoft has its Azure AI Foundry, launched last year around this time for AI agent creation, and AWS also offers agent builders on its Bedrock platform, but Google is hoping is suite of new features will help give it a competitive edge.

However, it isn’t just companies with their own models that court developers to build their AI agents within their platforms. Any enterprise service provider with an agent library also wants clients to make agents on their systems.

Capturing developer interest and keeping them within the ecosystem is the big battle between tech companies now, with features to make building and governing agents easier.

🔗 Sumber: venturebeat.com

🤖 Catatan MAROKO133

Artikel ini adalah rangkuman otomatis dari beberapa sumber terpercaya. Kami pilih topik yang sedang tren agar kamu selalu update tanpa ketinggalan.

✅ Update berikutnya dalam 30 menit — tema random menanti!

📌 MAROKO133 Update ai: AI’s capacity crunch: Latency risk, escalating costs, and t

The economics of the token explosion

Reinforcement learning as the new paradigm

The path to AI profitability

📌 MAROKO133 Update ai: Google Cloud updates its AI Agent Builder with new observab

Build agents faster

Governance layer

The battle of agent builders

🤖 Catatan MAROKO133

Recent Posts

Recent Comments

Archives

Categories