MAROKO133 Update ai: Meta’s DreamGym framework trains AI agents in a simulated world to cu

📌 MAROKO133 Hot ai: Meta’s DreamGym framework trains AI agents in a simulated worl

Researchers at Meta, the University of Chicago, and UC Berkeley have developed a new framework that addresses the high costs, infrastructure complexity, and unreliable feedback associated with using reinforcement learning (RL) to train large language model (LLM) agents. The framework, DreamGym, simulates an RL environment to train agents for complex applications. As it progresses through the training process, the framework dynamically adjusts task difficulty, ensuring the agent gradually learns to solve more challenging problems as it improves.

Experiments by the research team show that DreamGym substantially improves RL training in both fully synthetic settings and scenarios where the model must apply its simulated learning to the real world. In settings where RL is possible but expensive, it matches the performance of popular algorithms using only synthetic interactions, significantly cutting the costs of data gathering and environment interaction. 

This approach could be vital for enterprises, allowing them to train agents for bespoke applications while avoiding the complexities of setting up and running live RL environments.

The challenge of training LLM agents

Reinforcement learning is a key technique for training LLMs to handle complex tasks in agentic environments, such as web navigation, tool use, and robotics. It allows models to learn from direct interaction and experience, moving beyond the static datasets used in pre-training.

However, RL for agent training remains difficult. Real-world applications often involve long action sequences with sparse signals, meaning the agent only receives a positive signal after a long and correct sequence of actions. 

Gathering enough diverse and validated data is also expensive, frequently requiring human experts to verify tasks and annotate outcomes. And the infrastructure required to create the live environments for large-scale RL training can be prohibitively complex and costly. Not to mention that interacting with live systems carries risks, as wrong actions (like deleting a file) can cause irreparable damage.

“These limitations make building general-purpose and scalable systems for training agents with RL an open and pressing challenge,” the researchers write.

DreamGym directly challenges that model by delivering comparable performance entirely in simulation, removing the infrastructure burden that has kept most enterprises from adopting RL — and giving teams a practical path to train agents without touching costly or risky live environments.

How DreamGym works

The researchers describe DreamGym as a “unified and scalable RL framework that synthesizes diverse experience data in an online manner to enable efficient and effective training of LLM agents.” It is built around three core components that work together to create a controlled and effective training loop.

The first component is a “reasoning-based experience model” that translates the dynamics of a target environment into a textual space. This model acts as the simulator of the application environment. Instead of interacting with a costly real environment, the agent interacts with this model, which generates consistent state transitions and feedback based on the agent’s actions. 

The researchers argue that agent training doesn't need perfectly realistic environments, but rather data that is "sufficiently diverse, informative, and causally grounded." For example, in a web shopping task, the model synthesizes clean listings of on-page elements rather than processing raw HTML code. This abstract approach makes training the experience model highly efficient, requiring only a small amount of public data.

The second component is an “experience replay buffer,” which acts as a dynamic memory. At the beginning of the training process, the buffer is seeded with offline data to provide essential context and is continuously updated with new synthetic trajectories generated during training. This buffer helps guide the experience model's predictions, ensuring the synthetic experiences remain diverse and factually grounded. 

The third component, a “curriculum task generator,” works in tandem with the experience model to adaptively create new tasks that are progressively more challenging. It identifies tasks where the agent's performance is mixed (signaling they are difficult but solvable) and generates variations to push the agent's capabilities.

Together, these components create a closed-loop system for scalable agent training. “By unifying interaction, memory, and adaptive online task generation, DreamGym addresses the persistent challenges that have limited RL for LLM agents training: prohibitive cost, scarcity of diverse tasks, unstable reward signals, and heavy infrastructure demands,” according to the researchers.

DreamGym in action

The researchers evaluated DreamGym across several agent benchmarks, including WebShop (e-commerce), ALFWorld (embodied control), and WebArena (realistic web interaction). They used Llama 3 and Qwen 2.5 models as agent backbones and compared DreamGym against several traditional training strategies. These included offline methods like supervised fine-tuning (SFT) and direct preference optimization (DPO), as well as online RL algorithms like Proximal Policy Optimization (PPO) and Group Relative Policy Optimization (GRPO), which improve agents through live environment interaction.

DreamGym showed its most significant advantage in environments like WebArena, where setting up a large-scale RL infrastructure is difficult. Agents trained entirely inside DreamGym achieved success rates over 30% higher than baseline methods, which struggled with the sparse rewards and limited exploration in the real environment. The researchers said this shows DreamGym is a mechanism that makes RL training “feasible in domains that were previously intractable due to inherent task and engineering constraints.”

In environments where RL is supported but costly, agents trained with DreamGym performed on par with those trained using GRPO and PPO, but without any costly interactions with the external environment. The team also introduced a sim-to-real approach, DreamGym-S2R, where an agent is first trained in the synthetic environment and then fine-tuned on a small amount of real-world data. This strategy yielded over a 40% performance improvement compared to training from scratch in the real environment while using less than 10% of the external data. This provides a scalable "warm-start" for training general-purpose agents.

Finally, the framework demonstrated strong generalization. An agent trained on tasks in one domain, such as WebShop, could successfully transfer its learned skills to another, like WebArena. The researchers suggest this is because DreamGym agents learn in an "abstract meta-representation space, enabling the agent to learn domain-agnostic behavioral priors rather than memorizing task-specific patterns."

While still in its early stages, DreamGym shows that simulated environments can provide great gains in training agents. In practice, an enterprise could gather a small amount of trajectories and descriptions for the tasks it wants to automate. It can then use this small seed to bootstrap the DreamGym frameworks for the scalable and sample-efficient training of agents.

🔗 Sumber: venturebeat.com


📌 MAROKO133 Breaking ai: High-grade encryption solution protects classified commun

A new encryption solution has been launched to protect classified communications against emerging threats. Thales’ MISTRAL post-quantum encryptor offers a certified and qualified level of security for restricted-level communications.

The cutting-edge security solution is claimed to be capable of resisting quantum attacks.

Thales highlighted that the MISTRAL encryptor is intended for public administrations, operators of vital importance, and companies within the defence technological and industrial base.

Requiring a high degree of data protection

The system fully aligns with ANSSI recommendations and certified to Common Criteria EAL4+. It is ready for deployment in European projects requiring a high degree of data protection between industrial partners and high-technology stakeholders, according to a press release.

The solution has been launched at the European Cyber Week, held in Rennes, France.

“By anticipating tomorrow’s challenges, Thales will, from June 2026, provide France and its European partners with a high-grade encryption solution capable of resisting quantum attacks,” said Pierre Jeanne, Vice-President for Sovereign Cybersecurity activities at Thales.

“Public administrations, operators of vital importance, and companies in the defence technological and industrial base will benefit from a state-of-the-art encryptor to shield their Restricted-level communications against the quantum threat.”

MISTRAL encryptor retains its renowned ease of use and high performance

The company also pointed out that MISTRAL encryptor retains its renowned ease of use and high performance. The Thales solution not only ensures a very high level of security but also delivers optimal performance, with throughput of up to 4 × 10 Gbps and very low latency, without compromising protection. This solution also stands out for its ease of integration, supported in particular by centralised management.

MISTRAL has already entered operational testing, with availability scheduled for June 2026. In launching this solution, Thales further strengthens its technological leadership in cybersecurity and supports its customers in their transition towards a trusted future, where the security of information exchange is more than ever a strategic priority, as per the release.

Major step for European cybersecurity, especially in terms of preparing for a quantum future

Thales has claimed that MISTRAL is built to be quantum-resistant, meaning it can reportedly survive attacks from future quantum computers, which could potentially break many of today’s standard encryption schemes.

MISTRAL is certified to EAL4+ and aligned with ANSSI, global interoperability (especially with non-European entities) could be a challenge.

So far, it’s not entirely clear exactly which post-quantum cryptographic algorithms MISTRAL uses, as Thales hasn’t disclosed all details publicly. It’s yet not clear how they might evolve as PQC (post-quantum cryptography) standards themselves evolve.

Thales’ MISTRAL encryptor could be a major step for European cybersecurity — especially in terms of preparing for a quantum future. It’s not just a product, but also a signal that Europe is serious about developing sovereign, quantum-resistant communication infrastructure.

🔗 Sumber: interestingengineering.com


🤖 Catatan MAROKO133

Artikel ini adalah rangkuman otomatis dari beberapa sumber terpercaya. Kami pilih topik yang sedang tren agar kamu selalu update tanpa ketinggalan.

✅ Update berikutnya dalam 30 menit — tema random menanti!

Author: timuna