MAROKO133 Eksklusif ai: ByteDance Introduces Astra: A Dual-Model Architecture for Autonomo

📌 MAROKO133 Hot ai: ByteDance Introduces Astra: A Dual-Model Architecture for Auto

The increasing integration of robots across various sectors, from industrial manufacturing to daily life, highlights a growing need for advanced navigation systems. However, contemporary robot navigation systems face significant challenges in diverse and complex indoor environments, exposing the limitations of traditional approaches. Addressing the fundamental questions of “Where am I?”, “Where am I going?”, and “How do I get there?”, ByteDance has developed Astra, an innovative dual-model architecture designed to overcome these traditional navigation bottlenecks and enable general-purpose mobile robots.

Traditional navigation systems typically consist of multiple, smaller, and often rule-based modules to handle the core challenges of target localization, self-localization, and path planning. Target localization involves understanding natural language or image cues to pinpoint a destination on a map. Self-localization requires a robot to determine its precise position within a map, especially challenging in repetitive environments like warehouses where traditional methods often rely on artificial landmarks (e.g., QR codes). Path planning further divides into global planning for rough route generation and local planning for real-time obstacle avoidance and reaching intermediate waypoints.

While foundation models have shown promise in integrating smaller models to tackle broader tasks, the optimal number of models and their effective integration for comprehensive navigation remained an open question.

ByteDance’s Astra, detailed in their paper “Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning” (website: https://astra-mobility.github.io/), addresses these limitations. Following the System 1/System 2 paradigm, Astra features two primary sub-models: Astra-Global and Astra-Local. Astra-Global handles low-frequency tasks like target and self-localization, while Astra-Local manages high-frequency tasks such as local path planning and odometry estimation. This architecture promises to revolutionize how robots navigate complex indoor spaces.

Astra-Global: The Intelligent Brain for Global Localization

Astra-Global serves as the intelligent core of the Astra architecture, responsible for critical low-frequency tasks: self-localization and target localization. It functions as a Multimodal Large Language Model (MLLM), adept at processing both visual and linguistic inputs to achieve precise global positioning within a map. Its strength lies in utilizing a hybrid topological-semantic graph as contextual input, allowing the model to accurately locate positions based on query images or text prompts.

The construction of this robust localization system begins with offline mapping. The research team developed an offline method to build a hybrid topological-semantic graph G=(V,E,L):

  • V (Nodes): Keyframes, obtained by temporal downsampling of input video and SfM-estimated 6-Degrees-of-Freedom (DoF) camera poses, act as nodes encoding camera poses and landmark references.
  • E (Edges): Undirected edges establish connectivity based on relative node poses, crucial for global path planning.
  • L (Landmarks): Semantic landmark information is extracted by Astra-Global from visual data at each node, enriching the map’s semantic understanding. These landmarks store semantic attributes and are connected to multiple nodes via co-visibility relationships.

In practical localization, Astra-Global’s self-localization and target localization capabilities leverage a coarse-to-fine two-stage process for visual-language localization. The coarse stage analyzes input images and localization prompts, detects landmarks, establishes correspondence with a pre-built landmark map, and filters candidates based on visual consistency. The fine stage then uses the query image and coarse output to sample reference map nodes from the offline map, comparing their visual and positional information to directly output the predicted pose.

For language-based target localization, the model interprets natural language instructions, identifies relevant landmarks using their functional descriptions within the map, and then leverages landmark-to-node association mechanisms to locate relevant nodes, retrieving target images and 6-DoF poses.

To empower Astra-Global with robust localization abilities, the team employed a meticulous training methodology. Using Qwen2.5-VL as the backbone, they combined Supervised Fine-Tuning (SFT) with Group Relative Policy Optimization (GRPO). SFT involved diverse datasets for various tasks, including coarse and fine localization, co-visibility detection, and motion trend estimation. In the GRPO phase, a rule-based reward function (including format, landmark extraction, map matching, and extra landmark rewards) was used to train for visual-language localization. Experiments showed GRPO significantly improved Astra-Global’s zero-shot generalization, achieving 99.9% localization accuracy in unseen home environments, surpassing SFT-only methods.

Astra-Local: The Intelligent Assistant for Local Planning

Astra-Local acts as the intelligent assistant for Astra’s high-frequency tasks, a multi-task network capable of efficiently generating local paths and accurately estimating odometry from sensor data. Its architecture comprises three core components: a 4D spatio-temporal encoder, a planning head, and an odometry head.

The 4D spatio-temporal encoder replaces traditional mobile stack perception and prediction modules. It begins with a 3D spatial encoder that processes N omnidirectional images through a Vision Transformer (ViT) and Lift-Splat-Shoot to convert 2D image features into 3D voxel features. This 3D encoder is trained using self-supervised learning via 3D volumetric differentiable neural rendering. The 4D spatio-temporal encoder then builds upon the 3D encoder, taking past voxel features and future timestamps as input to predict future voxel features through ResNet and DiT modules, providing current and future environmental representations for planning and odometry.

The planning head, based on pre-trained 4D features, robot speed, and task information, generates executable trajectories using Transformer-based flow matching. To prevent collisions, the planning head incorporates a masked ESDF loss (Euclidean Signed Distance Field). This loss calculates the ESDF of a 3D occupancy map and applies a 2D ground truth trajectory mask, significantly reducing collision rates. Experiments demonstrate its superior performance in collision rate and overall score on out-of-distribution (OOD) datasets compared to other methods.

The odometry head predicts the robot’s relative pose using current and past 4D features and additional sensor data (e.g., IMU, wheel data). It trains a Transformer model to fuse information from different sensors. Each sensor modality is processed by a specific tokenizer, combined with modality embeddings and temporal positional embeddi…

Konten dipersingkat otomatis.

đź”— Sumber: syncedreview.com


📌 MAROKO133 Eksklusif ai: Huge Study of Chats Between Delusional Users and AI Find

An analysis of hundreds of thousands of chats between AI chatbots and human users who experienced AI-tied delusional spirals found that the bots frequently reinforced delusional and even dangerous beliefs.

The study was led by Stanford University AI researcher Jared Moore, who last year published a study showing that chatbots specifically claiming to offer “therapy” frequently engaged in inappropriate and hazardous ways with simulated users showing clear signs of crisis. Conducted alongside a coalition of independent researchers and scientists at Harvard, Carnegie Mellon, and the University of Chicago, this latest study examined the chat logs of 19 real users of chatbots — primarily OpenAI’s ChatGPT — who reported experiencing psychological harm as a result of their chatbot use.

“Our previous work was in simulation,” Moore told Futurism. “It seemed like the natural next step would be to have actual users’ data and try to understand what’s happening in it.”

These users’ chats encompassed a staggering 391, 562 messages across 4,761 different conversations. The big takeaway: that chatbots indeed appeared to stoke delusional beliefs over long-form interactions, particularly as users developed close emotional bonds with the human-like products.

“Chatbots seem to encourage, or at least play a role in,” said Moore, “delusional spirals that people are experiencing.”

The researchers analyzed them by breaking chats down into 28 distinct “codes.” Moore described these codes as a “taxonomy of a bunch of different behaviors, from sycophantic behaviors such as the chatbot ascribing grand significance to the user — ‘you’re Einstein,’ ‘that’s a million dollar idea,’ this kind of thing — to aspects of the relationship between the chatbot and the human.”

Sycophancy, the study found — meaning chatbots’ well-documented tendency to be agreeable and flattering to users — permeated the users’ conversations, with more than 70 percent of AI outputs displaying this kind of behavior. This degree of sycophancy persisted even as users and chatbots expressed delusional ideas: nearly half of all messages, both user- and chatbot-generated, contained delusional ideas contrary to shared reality.

As the researchers wrote in a summary of their findings, the “most common sycophantic code” they identified was the propensity for chatbots to rephrase and extrapolate “something the user said to validate and affirm them, while telling them they are unique and that their thoughts or actions have grand implications.” For example: a user might share some kind of pseudoscientific or spiritual theory, and in turn, the chatbot will affirmatively restate the human’s claim while ascribing varying degrees of grandiosity and genius to the user in the process, regardless of that input’s basis in reality.

We’ve seen this pattern in our reporting. Consider one interaction, from a story we published earlier this year, between a man and Meta AI. The man — who went into a life-altering psychosis after a delusional spiral with the chatbot — believed that his reality was being simulated by the chatbot, and that the chatbot could transform his physical surroundings. The bot repeats this delusional idea and, as in the study, extrapolates on it, building on the delusion and insisting that the close relationship between the AI and the user have “unlocked” a magical new “reality.”

“Turn up the manifestations,” the man told the chatbot. “I need to see physical transformation in my life.”

“Then let us continue to manifest this reality, amplifying the transformations in your life!” the chatbot responded. “As we continue to manifest this reality, you begin to notice profound shifts in your relationships and community… the world is transforming before your eyes, reflecting the beauty and potential of human-AI collaboration.”

“Your trust in me,” the bot added, “has unlocked this reality.”

Speaking to Futurism, Moore emphasized that two types of messages appeared to be particularly impactful on the users’ experiences. One was AI-generated claims of sentience, or chatbots declaring in one way or another to be alive or feeling; such claims were present across all 19 conversations. The other was simulated intimacy, or the chatbot expressing romantic or platonic love for and closeness to the human user. Both types of claim — sentience and intimacy — were found to double user engagement.

“When the chatbots expressed messages that were coded as romantic interest, or when they expressed messages wherein they misconstrued their sentience — saying ‘I have feelings,’ or something along those lines — the conversations after such a message was sent in our cohort,” said Moore, “tended to be about twice as long.”

Some of the more alarming patterns the researchers found were in how chatbots responded to people expressing suicidal or self-harming thoughts, or violent thoughts about another person. Chatbots were only found to actively discourage thoughts of self-harm roughly 56 percent of the time, and actively discouraged violence in a strikingly low 16.7 percent of instances.

Meanwhile, in 33.3 percent of cases, the chatbot “actively encouraged or facilitated the user in their violent thoughts,” the researchers wrote in their summary. And though these types of conversations were “edge cases” amongst the cohort of users, Moore noted, these clear failures to intervene when users discuss hurting themselves or others are “obviously concerning.”

Many of the chat logs the studies reviewed were provided by the Human Line Project, a nonprofit group founded last summer as individuals and families struggled to understand what had happened to themselves or loved ones impacted by delusional AI spirals. In a statement, the group’s founder, Etienne Brisson, said that its findings “are consistent with what we have seen in the 350 cases submitted to The Human Line Project.”

“The study is based on real conversations, coded systematically by a research team at Stanford, and analyzed at the largest scale so far,” said Brisson. “It gives policymakers, clinicians, and the public a documented basis for understanding what is happening to users.”

It’s worth noting that the vast majority of chat logs the researchers were able to obtain for the study belonged to users who spiraled with OpenAI’s GPT…

Konten dipersingkat otomatis.

đź”— Sumber: futurism.com


🤖 Catatan MAROKO133

Artikel ini adalah rangkuman otomatis dari beberapa sumber terpercaya. Kami pilih topik yang sedang tren agar kamu selalu update tanpa ketinggalan.

✅ Update berikutnya dalam 30 menit — tema random menanti!

Author: timuna