MAROKO133 Update ai: Together AI's ATLAS adaptive speculator delivers 400% inference

📌 MAROKO133 Hot ai: Together AI's ATLAS adaptive speculator delivers 400% inf

Enterprises expanding AI deployments are hitting an invisible performance wall. The culprit? Static speculators that can't keep up with shifting workloads.

Speculators are smaller AI models that work alongside large language models during inference. They draft multiple tokens ahead, which the main model then verifies in parallel. This technique (called speculative decoding) has become essential for enterprises trying to reduce inference costs and latency. Instead of generating tokens one at a time, the system can accept multiple tokens at once, dramatically improving throughput.

Together AI today announced research and a new system called ATLAS (AdapTive-LeArning Speculator System) that aims to help enterprises overcome the challenge of static speculators. The technique provides a self-learning inference optimization capability that can help to deliver up to 400% faster inference performance than a baseline level of performance available in existing inference technologies such as vLLM.. The system addresses a critical problem: as AI workloads evolve, inference speeds degrade, even with specialized speculators in place.

The company which got its start in 2023, has been focused on optimizing inference on its enterprise AI platform. Earlier this year the company raised $305 million as customer adoption and demand has grown.

"Companies we work with generally, as they scale up, they see shifting workloads, and then they don't see as much speedup from speculative execution as before," Tri Dao, chief scientist at Together AI, told VentureBeat in an exclusive interview. "These speculators generally don't work well when their workload domain starts to shift."

The workload drift problem no one talks about

Most speculators in production today are "static" models. They're trained once on a fixed dataset representing expected workloads, then deployed without any ability to adapt. Companies like Meta and Mistral ship pre-trained speculators alongside their main models. Inference platforms like vLLM use these static speculators to boost throughput without changing output quality.

But there's a catch. When an enterprise's AI usage evolves the static speculator's accuracy plummets.

"If you're a company producing coding agents, and most of your developers have been writing in Python, all of a sudden some of them switch to writing Rust or C, then you see the speed starts to go down," Dao explained. "The speculator has a mismatch between what it was trained on versus what the actual workload is."

This workload drift represents a hidden tax on scaling AI. Enterprises either accept degraded performance or invest in retraining custom speculators. That process captures only a snapshot in time and quickly becomes outdated.

How adaptive speculators work: A dual-model approach

ATLAS uses a dual-speculator architecture that combines stability with adaptation:

The static speculator – A heavyweight model trained on broad data provides consistent baseline performance. It serves as a "speed floor."

The adaptive speculator – A lightweight model learns continuously from live traffic. It specializes on-the-fly to emerging domains and usage patterns.

The confidence-aware controller – An orchestration layer dynamically chooses which speculator to use. It adjusts the speculation "lookahead" based on confidence scores.

"Before the adaptive speculator learns anything, we still have the static speculator to help provide the speed boost in the beginning," Ben Athiwaratkun, staff AI scientist at Together AI explained to VentureBeat. "Once the adaptive speculator becomes more confident, then the speed grows over time."

The technical innovation lies in balancing acceptance rate (how often the target model agrees with drafted tokens) and draft latency. As the adaptive model learns from traffic patterns, the controller relies more on the lightweight speculator and extends lookahead. This compounds performance gains.

Users don't need to tune any parameters. "On the user side, users don't have to turn any knobs," Dao said. "On our side, we have turned these knobs for users to adjust in a configuration that gets good speedup."

Performance that rivals custom silicon

Together AI's testing shows ATLAS reaching 500 tokens per second on DeepSeek-V3.1 when fully adapted. More impressively, those numbers on Nvidia B200 GPUs match or exceed specialized inference chips like Groq's custom hardware.

"The software and algorithmic improvement is able to close the gap with really specialized hardware," Dao said. "We were seeing 500 tokens per second on these huge models that are even faster than some of the customized chips."

The 400% speedup that the company claims for inference represents the cumulative effect of Together's Turbo optimization suite. FP4 quantization delivers 80% speedup over FP8 baseline. The static Turbo Speculator adds another 80-100% gain. The adaptive system layers on top. Each optimization compounds the benefits of the others.

Compared to standard inference engines like vLLM or Nvidia's TensorRT-LLM, the improvement is substantial. Together AI benchmarks against the stronger baseline between the two for each workload before applying speculative optimizations.

The memory-compute tradeoff explained

The performance gains stem from exploiting a fundamental inefficiency in modern inference: wasted compute capacity.

Dao explained that typically during inference, much of the compute power is not fully utilized.

"During inference, which is actually the dominant workload nowadays, you're mostly using the memory subsystem," he said.

Speculative decoding trades idle compute for reduced memory access. When a model generates one token at a time, it's memory-bound. The GPU sits idle while waiting for memory. But when the speculator proposes five tokens and the target model verifies them simultaneously, compute utilization spikes while memory access remains roughly constant.

"The total amount of compute to generate five tokens is the same, but you only had to access memory once, instead of five times," Dao said.

Think of it as intelligent caching for AI

For infrastructure teams familiar with traditional database optimization, adaptive speculators function like an intelligent caching layer, but with a crucial difference.

Traditional caching systems like Redis or memcached require exact matches. You store the exact same query result and retrieve it when that specific query runs again. Adaptive speculators work differently.

"You can view it as an intelligent way of caching, not storing exactly, but figuring out some patterns that you see," Dao explained. "Broadly, we're observing that you're working with similar code, or working with similar, you know, controlling compute in a similar way. We can t…

Konten dipersingkat otomatis.

đź”— Sumber: venturebeat.com


📌 MAROKO133 Breaking ai: High-pressure monster gun to make 40-ton smart military v

A strategic tactical military vehicle is set to get an upgrade that can help meet the evolving demands of the battlefield. The latest CV90120 vehicle is set to be integrated with the 120mm L44A1 Low Recoil (LR) weapon system.

BAE Systems Hägglunds has teamed up with Rheinmetall Weapon and Ammunition to complete the integration.

Developed by BAE Systems, CV90120 provides high tactical and strategic mobility with anti-tank capability. The vehicle offers high survivability in any terrain or environment, delivering unrivalled performance with a unique combat system design.

High-pressure gun solution

The integration of Rheinmetall’s 120mm L44A1 LR gun is expected to bolster the military vehicle’s lethality.

The system features a high-pressure gun solution for platforms for outstanding performance compatible with all NATO standard 120mm rounds, including Rheinmetall´s DM11 3-mode programmable high-explosive (HE) round and enhanced kinetic energy (KE) rounds.

“Our customers will benefit from the combined expertise and resources of our two companies, resulting in a solution that offers enhanced firepower, protection, and mobility – a winning combination on the modern battlefield,” said Tarkan Turkcan, CV90 platform director.

Enhanced firepower

BAE Systems claims that its new CV90120 is based on the latest CV90MkIV variant chassis and will provide enhanced battlefield speeds, upgraded electronic architecture, active damping, a new engine, a heavy-duty transmission, and an active protection system.

It also provides the added advantage of having a common platform for those armies that already have selected to use the CV90 IFV or will do so in the future, through commonalities in training and spare parts.

The CV90120 with a payload capacity of 40-ton provides the firepower of a modern main battle tank with the superior strategic and tactical mobility of the CV90 infantry fighting vehicle, providing heavy, direct firepower at a significantly reduced cost, according to the company.

CV90 is a highly versatile vehicle

BAE Systems also highlighted that the CV90 is a highly versatile vehicle for all military missions, designed to provide superior protection, increased mobility, and evolutionary scalability. BAE Systems Hägglunds has produced more than 1,400 CV90s in 17 variants for 10 European nations as part of the CV90 User Group. The CV90 has a pedigree of successful worldwide operations, including UN and NATO collaborative missions.

The Germany-based Rheinmetall is one of the world’s leading suppliers of large-calibre weapons and ammunition. The company’s globally acclaimed smoothbore L44 and L55 tank guns combine tremendous firepower with a high first-round hit probability, even when the tank is on the move.

Particularly in the field of weapons and ammunition, recent technological breakthroughs have let Rheinmetall make a vital contribution to the unsurpassed combat effectiveness of the Leopard MBT, according to Rheinmetall.

As the standard NATO smoothbore gun for main battle tanks Leopard 2 and AbramsM1A1, Rheinmetall’s L44 tank gun proved superior to all rivals in the 120 mm arena.

Apart from serving as the main armament for the world’s finest main battle tank –t he Leopard 2 – this high-performance gun has been integrated into a number of foreign tanks and is manufactured under licence in the United States. In combination with the Rheinmetall ammunition family KE, HE and PELE the L44’s firepower gives tanks the edge even in long-range engagements. This smoothbore gun is the precursor of Rheinmetall’s L55 and LLR/L47 tank guns, according to the company.

đź”— Sumber: interestingengineering.com


🤖 Catatan MAROKO133

Artikel ini adalah rangkuman otomatis dari beberapa sumber terpercaya. Kami pilih topik yang sedang tren agar kamu selalu update tanpa ketinggalan.

✅ Update berikutnya dalam 30 menit — tema random menanti!

Author: timuna