MAROKO133 Hot ai: ByteDance Introduces Astra: A Dual-Model Architecture for Autonomous Rob

📌 MAROKO133 Update ai: ByteDance Introduces Astra: A Dual-Model Architecture for A

The increasing integration of robots across various sectors, from industrial manufacturing to daily life, highlights a growing need for advanced navigation systems. However, contemporary robot navigation systems face significant challenges in diverse and complex indoor environments, exposing the limitations of traditional approaches. Addressing the fundamental questions of “Where am I?”, “Where am I going?”, and “How do I get there?”, ByteDance has developed Astra, an innovative dual-model architecture designed to overcome these traditional navigation bottlenecks and enable general-purpose mobile robots.

Traditional navigation systems typically consist of multiple, smaller, and often rule-based modules to handle the core challenges of target localization, self-localization, and path planning. Target localization involves understanding natural language or image cues to pinpoint a destination on a map. Self-localization requires a robot to determine its precise position within a map, especially challenging in repetitive environments like warehouses where traditional methods often rely on artificial landmarks (e.g., QR codes). Path planning further divides into global planning for rough route generation and local planning for real-time obstacle avoidance and reaching intermediate waypoints.

While foundation models have shown promise in integrating smaller models to tackle broader tasks, the optimal number of models and their effective integration for comprehensive navigation remained an open question.

ByteDance’s Astra, detailed in their paper “Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning” (website: https://astra-mobility.github.io/), addresses these limitations. Following the System 1/System 2 paradigm, Astra features two primary sub-models: Astra-Global and Astra-Local. Astra-Global handles low-frequency tasks like target and self-localization, while Astra-Local manages high-frequency tasks such as local path planning and odometry estimation. This architecture promises to revolutionize how robots navigate complex indoor spaces.

Astra-Global: The Intelligent Brain for Global Localization

Astra-Global serves as the intelligent core of the Astra architecture, responsible for critical low-frequency tasks: self-localization and target localization. It functions as a Multimodal Large Language Model (MLLM), adept at processing both visual and linguistic inputs to achieve precise global positioning within a map. Its strength lies in utilizing a hybrid topological-semantic graph as contextual input, allowing the model to accurately locate positions based on query images or text prompts.

The construction of this robust localization system begins with offline mapping. The research team developed an offline method to build a hybrid topological-semantic graph G=(V,E,L):

  • V (Nodes): Keyframes, obtained by temporal downsampling of input video and SfM-estimated 6-Degrees-of-Freedom (DoF) camera poses, act as nodes encoding camera poses and landmark references.
  • E (Edges): Undirected edges establish connectivity based on relative node poses, crucial for global path planning.
  • L (Landmarks): Semantic landmark information is extracted by Astra-Global from visual data at each node, enriching the map’s semantic understanding. These landmarks store semantic attributes and are connected to multiple nodes via co-visibility relationships.

In practical localization, Astra-Global’s self-localization and target localization capabilities leverage a coarse-to-fine two-stage process for visual-language localization. The coarse stage analyzes input images and localization prompts, detects landmarks, establishes correspondence with a pre-built landmark map, and filters candidates based on visual consistency. The fine stage then uses the query image and coarse output to sample reference map nodes from the offline map, comparing their visual and positional information to directly output the predicted pose.

For language-based target localization, the model interprets natural language instructions, identifies relevant landmarks using their functional descriptions within the map, and then leverages landmark-to-node association mechanisms to locate relevant nodes, retrieving target images and 6-DoF poses.

To empower Astra-Global with robust localization abilities, the team employed a meticulous training methodology. Using Qwen2.5-VL as the backbone, they combined Supervised Fine-Tuning (SFT) with Group Relative Policy Optimization (GRPO). SFT involved diverse datasets for various tasks, including coarse and fine localization, co-visibility detection, and motion trend estimation. In the GRPO phase, a rule-based reward function (including format, landmark extraction, map matching, and extra landmark rewards) was used to train for visual-language localization. Experiments showed GRPO significantly improved Astra-Global’s zero-shot generalization, achieving 99.9% localization accuracy in unseen home environments, surpassing SFT-only methods.

Astra-Local: The Intelligent Assistant for Local Planning

Astra-Local acts as the intelligent assistant for Astra’s high-frequency tasks, a multi-task network capable of efficiently generating local paths and accurately estimating odometry from sensor data. Its architecture comprises three core components: a 4D spatio-temporal encoder, a planning head, and an odometry head.

The 4D spatio-temporal encoder replaces traditional mobile stack perception and prediction modules. It begins with a 3D spatial encoder that processes N omnidirectional images through a Vision Transformer (ViT) and Lift-Splat-Shoot to convert 2D image features into 3D voxel features. This 3D encoder is trained using self-supervised learning via 3D volumetric differentiable neural rendering. The 4D spatio-temporal encoder then builds upon the 3D encoder, taking past voxel features and future timestamps as input to predict future voxel features through ResNet and DiT modules, providing current and future environmental representations for planning and odometry.

The planning head, based on pre-trained 4D features, robot speed, and task information, generates executable trajectories using Transformer-based flow matching. To prevent collisions, the planning head incorporates a masked ESDF loss (Euclidean Signed Distance Field). This loss calculates the ESDF of a 3D occupancy map and applies a 2D ground truth trajectory mask, significantly reducing collision rates. Experiments demonstrate its superior performance in collision rate and overall score on out-of-distribution (OOD) datasets compared to other methods.

The odometry head predicts the robot’s relative pose using current and past 4D features and additional sensor data (e.g., IMU, wheel data). It trains a Transformer model to fuse information from different sensors. Each sensor modality is processed by a specific tokenizer, combined with modality embeddings and temporal positional embeddi…

Konten dipersingkat otomatis.

🔗 Sumber: syncedreview.com


📌 MAROKO133 Breaking ai: AI is moving to the edge – and network security needs to

Presented by T-Mobile for Business


Small and mid-sized businesses are adopting AI at a pace that would have seemed unrealistic even a few years ago. Smart assistants that greet customers, predictive tools that flag inventory shortages before they happen, and on-site analytics that help staff make decisions faster — these used to be features of the enterprise. Now they’re being deployed in retail storefronts, regional medical clinics, branch offices, and remote operations hubs.

What’s changed is not just the AI itself, but where it runs. Increasingly, AI workloads are being pushed out of centralized data centers and into the real world — into the places where employees work and customers interact. This shift to the edge promises faster insights and more resilient operations, but it also transforms the demands placed on the network. Edge sites need consistent bandwidth, real-time data pathways, and the ability to process information locally rather than relying on the cloud for every decision.

The catch is that as companies race to connect these locations, security often lags behind. A store may adopt AI-enabled cameras or sensors long before it has the policies to manage them. A clinic may roll out mobile diagnostic devices without fully segmenting their traffic. A warehouse may rely on a mix of Wi-Fi, wired, and cellular connections that weren’t designed to support AI-driven operations. When connectivity scales faster than security, it creates cracks — unmonitored devices, inconsistent access controls, and unsegmented data flows that make it hard to see what’s happening, let alone protect it.

Edge AI only delivers its full value when connectivity and security evolve together.

Why AI is moving to the edge — and what that breaks

Businesses are shifting AI to the edge for three core reasons:

  • Real-time responsiveness: Some decisions can’t wait for a round trip to the cloud. Whether it’s identifying an item on a shelf, detecting an abnormal reading from a medical device, or recognizing a safety risk in a warehouse aisle, the delay introduced by centralized processing can mean missed opportunities or slow reactions.

  • Resilience and privacy: Keeping data and inference local makes operations less vulnerable to outages or latency spikes, and it reduces the flow of sensitive information across networks. This helps SMBs meet data sovereignty and compliance requirements without rewriting their entire infrastructure.

  • Mobility and deployment speed: Many SMBs operate across distributed footprints — remote workers, pop-up locations, seasonal operations, or mobile teams. Wireless-first connectivity, including 5G business lines, lets them deploy AI tools quickly without waiting for fixed circuits or expensive buildouts.

Technologies like Edge Control from T-Mobile for Business fit naturally into this model. By routing traffic directly along the paths it needs — keeping latency-sensitive workloads local and bypassing the bottlenecks that traditional VPNs introduce — businesses can adopt edge AI without dragging their network into constant contention.

Yet the shift introduces new risk. Every edge site becomes, in effect, its own small data center. A retail store may have cameras, sensors, POS systems, digital signage, and staff devices all sharing the same access point. A clinic may run diagnostic tools, tablets, wearables, and video consult systems side by side. A manufacturing floor might combine robotics, sensors, handheld scanners, and on-site analytics platforms.

This diversity increases the attack surface dramatically. Many SMBs roll out connectivity first, then add piecemeal security later — leaving the blind spots attackers rely on.

Zero trust becomes essential at the edge

When AI is distributed across dozens or hundreds of sites, the old idea of a single secure “inside” network breaks down. Every store, clinic, kiosk, or field location becomes its own micro-environment — and every device within it becomes its own potential entry point.

Zero trust offers a framework to make this manageable.

At the edge, zero trust means:

  • Verifying identity rather than location — access is granted because a user or device proves who it is, not because it sits behind a corporate firewall.

  • Continuous authentication — trust isn’t permanent; it’s re-evaluated throughout a session.

  • Segmentation that limits movement — if something goes wrong, attackers can’t jump freely from system to system.

This approach is especially critical given that many edge devices can’t run traditional security clients. SIM-based identity and secure mobile connectivity — areas where T-Mobile for Business brings significant strength — help verify IoT devices, 5G routers, and sensors that otherwise sit outside the visibility of IT teams.

This is why connectivity providers are increasingly combining networking and security into a single approach. T-Mobile for Business embeds segmentation, device visibility, and zero-trust safeguards directly into its wireless-first connectivity offerings, reducing the need for SMBs to stitch together multiple tools.

Secure-by-default networks reshape the landscape

A major architectural shift is underway: networks that assume every device, session, and workload must be authenticated, segmented, and monitored from the start. Instead of building security on top of connectivity, the two are fused.

T-Mobile for Business solutions shows how this is evolving. Its SASE platform, powered by Palo Alto Networks Prisma SASE 5G, blends secure access with connectivity into one cloud-delivered service. Private Access gives users the least-privileged access they need, nothing more. T-SIMsecure authenticates devices at the SIM layer, allowing IoT sensors and 5G routers to be verified automatically. Security Slice isolates sensitive SASE traffic on a dedicated portion of the 5G network, ensuring consistency even during heavy demand.

A unified dashboard like T-Platform brings it together, offering real-time visibility across SASE, IoT, business internet, and edge control — simplifying operations for SMBs with limited staff.

The future: AI that runs the edge and protects it

As AI models become more dynamic and autonomous, we’ll see the relationship flip: the edge won’t just support AI; AI will actively run and secure the edge — optimizing traffic paths, adjusting segmentation automatically, and spotting anomalies that matter to one specific store or site.

Self-healing networks and adaptive policy engines will move from experimental to expected.

For SMBs, this is a pivotal moment. The organizations that modernize their connectivity and security foundations now will be the ones best positioned to scale AI everywhere — safely, confidently, and without unnecessary complexity.

Partners like T-Mobile for Business are already moving in this direction, giving SMBs a way to deploy AI at the edge without sacrificing control or visibility.


Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact [email protected].

🔗 Sumber: venturebeat.com


🤖 Catatan MAROKO133

Artikel ini adalah rangkuman otomatis dari beberapa sumber terpercaya. Kami pilih topik yang sedang tren agar kamu selalu update tanpa ketinggalan.

✅ Update berikutnya dalam 30 menit — tema random menanti!

Author: timuna