📌 MAROKO133 Breaking ai: ByteDance Introduces Astra: A Dual-Model Architecture for
The increasing integration of robots across various sectors, from industrial manufacturing to daily life, highlights a growing need for advanced navigation systems. However, contemporary robot navigation systems face significant challenges in diverse and complex indoor environments, exposing the limitations of traditional approaches. Addressing the fundamental questions of “Where am I?”, “Where am I going?”, and “How do I get there?”, ByteDance has developed Astra, an innovative dual-model architecture designed to overcome these traditional navigation bottlenecks and enable general-purpose mobile robots.
Traditional navigation systems typically consist of multiple, smaller, and often rule-based modules to handle the core challenges of target localization, self-localization, and path planning. Target localization involves understanding natural language or image cues to pinpoint a destination on a map. Self-localization requires a robot to determine its precise position within a map, especially challenging in repetitive environments like warehouses where traditional methods often rely on artificial landmarks (e.g., QR codes). Path planning further divides into global planning for rough route generation and local planning for real-time obstacle avoidance and reaching intermediate waypoints.
While foundation models have shown promise in integrating smaller models to tackle broader tasks, the optimal number of models and their effective integration for comprehensive navigation remained an open question.
ByteDance’s Astra, detailed in their paper “Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning” (website: https://astra-mobility.github.io/), addresses these limitations. Following the System 1/System 2 paradigm, Astra features two primary sub-models: Astra-Global and Astra-Local. Astra-Global handles low-frequency tasks like target and self-localization, while Astra-Local manages high-frequency tasks such as local path planning and odometry estimation. This architecture promises to revolutionize how robots navigate complex indoor spaces.
Astra-Global: The Intelligent Brain for Global Localization
Astra-Global serves as the intelligent core of the Astra architecture, responsible for critical low-frequency tasks: self-localization and target localization. It functions as a Multimodal Large Language Model (MLLM), adept at processing both visual and linguistic inputs to achieve precise global positioning within a map. Its strength lies in utilizing a hybrid topological-semantic graph as contextual input, allowing the model to accurately locate positions based on query images or text prompts.
The construction of this robust localization system begins with offline mapping. The research team developed an offline method to build a hybrid topological-semantic graph G=(V,E,L):
- V (Nodes): Keyframes, obtained by temporal downsampling of input video and SfM-estimated 6-Degrees-of-Freedom (DoF) camera poses, act as nodes encoding camera poses and landmark references.
- E (Edges): Undirected edges establish connectivity based on relative node poses, crucial for global path planning.
- L (Landmarks): Semantic landmark information is extracted by Astra-Global from visual data at each node, enriching the map’s semantic understanding. These landmarks store semantic attributes and are connected to multiple nodes via co-visibility relationships.
In practical localization, Astra-Global’s self-localization and target localization capabilities leverage a coarse-to-fine two-stage process for visual-language localization. The coarse stage analyzes input images and localization prompts, detects landmarks, establishes correspondence with a pre-built landmark map, and filters candidates based on visual consistency. The fine stage then uses the query image and coarse output to sample reference map nodes from the offline map, comparing their visual and positional information to directly output the predicted pose.
For language-based target localization, the model interprets natural language instructions, identifies relevant landmarks using their functional descriptions within the map, and then leverages landmark-to-node association mechanisms to locate relevant nodes, retrieving target images and 6-DoF poses.
To empower Astra-Global with robust localization abilities, the team employed a meticulous training methodology. Using Qwen2.5-VL as the backbone, they combined Supervised Fine-Tuning (SFT) with Group Relative Policy Optimization (GRPO). SFT involved diverse datasets for various tasks, including coarse and fine localization, co-visibility detection, and motion trend estimation. In the GRPO phase, a rule-based reward function (including format, landmark extraction, map matching, and extra landmark rewards) was used to train for visual-language localization. Experiments showed GRPO significantly improved Astra-Global’s zero-shot generalization, achieving 99.9% localization accuracy in unseen home environments, surpassing SFT-only methods.
Astra-Local: The Intelligent Assistant for Local Planning
Astra-Local acts as the intelligent assistant for Astra’s high-frequency tasks, a multi-task network capable of efficiently generating local paths and accurately estimating odometry from sensor data. Its architecture comprises three core components: a 4D spatio-temporal encoder, a planning head, and an odometry head.
The 4D spatio-temporal encoder replaces traditional mobile stack perception and prediction modules. It begins with a 3D spatial encoder that processes N omnidirectional images through a Vision Transformer (ViT) and Lift-Splat-Shoot to convert 2D image features into 3D voxel features. This 3D encoder is trained using self-supervised learning via 3D volumetric differentiable neural rendering. The 4D spatio-temporal encoder then builds upon the 3D encoder, taking past voxel features and future timestamps as input to predict future voxel features through ResNet and DiT modules, providing current and future environmental representations for planning and odometry.
The planning head, based on pre-trained 4D features, robot speed, and task information, generates executable trajectories using Transformer-based flow matching. To prevent collisions, the planning head incorporates a masked ESDF loss (Euclidean Signed Distance Field). This loss calculates the ESDF of a 3D occupancy map and applies a 2D ground truth trajectory mask, significantly reducing collision rates. Experiments demonstrate its superior performance in collision rate and overall score on out-of-distribution (OOD) datasets compared to other methods.
The odometry head predicts the robot’s relative pose using current and past 4D features and additional sensor data (e.g., IMU, wheel data). It trains a Transformer model to fuse information from different sensors. Each sensor modality is processed by a specific tokenizer, combined with modality embeddings and temporal positional embeddi…
Konten dipersingkat otomatis.
đź”— Sumber: syncedreview.com
📌 MAROKO133 Eksklusif ai: Singapore to host world’s first airport testbed for open
Singapore will host the world’s first airport testbed for next-generation open-fan aircraft engines. The memorandum of understanding was signed at the Changi Aviation Summit, held on the eve of the Singapore Airshow, marking a significant milestone for global engine development efforts and for the city-state’s aviation ambitions.
The agreement, signed by the Civil Aviation Authority of Singapore, Airbus, and CFM International, will see Singapore conduct real-world evaluations of open-fan engines developed under CFM’s Revolutionary Innovation for Sustainable Engines program.
The initiative was confirmed ahead of the summit, with CAAS Director General Han Kok Juan briefing reporters on Jan 29. The work is expected to run for several years and will unfold at either Singapore Changi Airport or Seletar Airport.
Singapore becomes a real-world test site
Under the agreement, Singapore will serve as the first operational airport environment to study how open-fan engines could be integrated into daily aviation activities. Unlike traditional high-bypass turbofan engines, open-fan designs feature exposed fan blades, allowing for larger diameters with lower aerodynamic drag.
CAAS said the testbed will focus on developing a comprehensive readiness framework to support the introduction of these engines.
This includes assessing how aircraft and engine design choices interact with existing airport infrastructure, such as taxiways, gates, and maintenance areas. It will also study how operational procedures on the ground and in the air may need to change to support the new technology.
Both Changi Airport, one of the world’s busiest international hubs, and Seletar Airport, which supports business and regional aviation, are being considered. Using active airports is central to the effort, as it allows engineers and regulators to observe how open-fan aircraft perform in realistic, high-tempo environments.
RISE engines and next-generation aircraft
CFM first revealed the RISE concept in 2021, positioning it as a successor to today’s most advanced commercial jet engines. The program targets a 20 percent improvement in fuel efficiency compared with current engines, a key goal as airlines face rising fuel costs and tighter emissions requirements.
The RISE architecture is designed to remain compatible with sustainable aviation fuel and to support longer-term pathways toward hydrogen-based propulsion.
Airbus has already indicated that its next-generation single-aisle aircraft, expected to enter service in the second half of the 2030s, could feature an open-fan engine. That aircraft is expected to sit in the 200-seat class, a core segment of the global airline market.
By linking engine development with airport operations early, the partners aim to reduce barriers to entry when the technology becomes commercially available.
Safety, regulation, and operational readiness
Open-fan engines also introduce new safety and regulatory considerations. Exposed fan blades change how ground crews interact with aircraft and require careful assessment of noise, debris, and maintenance procedures. CAAS said the readiness framework will therefore extend beyond engineering.
The work will cover safety standards, regulatory processes, and the training required for airport and airline staff. It will also examine how maintenance routines and day-to-day airport operations can adapt without disrupting existing traffic flows.
“This first-of-its-kind agreement is a huge boon for the CFM RISE development program,” CFM President and CEO GaĂ«l MĂ©heust said in a statement.
“Being able to perform real-world demonstrations from ground handling and maintenance activities to day-to-day airport operations will help build confidence among airlines, regulators and, ultimately, the flying public in the safety, durability and efficiency of open-fan technology.”
Strengthening Singapore’s aerospace role
For CAAS, the collaboration fits into a broader strategy to establish Singapore as a global hub for aerospace innovation. By combining a strong regulatory framework with a dense airport ecosystem, the authority aims to accelerate the testing and deployment of advanced aviation technologies.
Han said CAAS will finalize the project scope and timeline with CFM as the program moves forward. The multi-year effort is expected to generate insights not only for engine manufacturers but also for airports and airlines worldwide as they prepare for the next generation of aircraft.
đź”— Sumber: interestingengineering.com
🤖 Catatan MAROKO133
Artikel ini adalah rangkuman otomatis dari beberapa sumber terpercaya. Kami pilih topik yang sedang tren agar kamu selalu update tanpa ketinggalan.
✅ Update berikutnya dalam 30 menit — tema random menanti!
