π MAROKO133 Hot ai: ByteDance Introduces Astra: A Dual-Model Architecture for Auto
The increasing integration of robots across various sectors, from industrial manufacturing to daily life, highlights a growing need for advanced navigation systems. However, contemporary robot navigation systems face significant challenges in diverse and complex indoor environments, exposing the limitations of traditional approaches. Addressing the fundamental questions of “Where am I?”, “Where am I going?”, and “How do I get there?”, ByteDance has developed Astra, an innovative dual-model architecture designed to overcome these traditional navigation bottlenecks and enable general-purpose mobile robots.
Traditional navigation systems typically consist of multiple, smaller, and often rule-based modules to handle the core challenges of target localization, self-localization, and path planning. Target localization involves understanding natural language or image cues to pinpoint a destination on a map. Self-localization requires a robot to determine its precise position within a map, especially challenging in repetitive environments like warehouses where traditional methods often rely on artificial landmarks (e.g., QR codes). Path planning further divides into global planning for rough route generation and local planning for real-time obstacle avoidance and reaching intermediate waypoints.
While foundation models have shown promise in integrating smaller models to tackle broader tasks, the optimal number of models and their effective integration for comprehensive navigation remained an open question.
ByteDance’s Astra, detailed in their paper “Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning” (website: https://astra-mobility.github.io/), addresses these limitations. Following the System 1/System 2 paradigm, Astra features two primary sub-models: Astra-Global and Astra-Local. Astra-Global handles low-frequency tasks like target and self-localization, while Astra-Local manages high-frequency tasks such as local path planning and odometry estimation. This architecture promises to revolutionize how robots navigate complex indoor spaces.
Astra-Global: The Intelligent Brain for Global Localization
Astra-Global serves as the intelligent core of the Astra architecture, responsible for critical low-frequency tasks: self-localization and target localization. It functions as a Multimodal Large Language Model (MLLM), adept at processing both visual and linguistic inputs to achieve precise global positioning within a map. Its strength lies in utilizing a hybrid topological-semantic graph as contextual input, allowing the model to accurately locate positions based on query images or text prompts.
The construction of this robust localization system begins with offline mapping. The research team developed an offline method to build a hybrid topological-semantic graph G=(V,E,L):
- V (Nodes): Keyframes, obtained by temporal downsampling of input video and SfM-estimated 6-Degrees-of-Freedom (DoF) camera poses, act as nodes encoding camera poses and landmark references.
- E (Edges): Undirected edges establish connectivity based on relative node poses, crucial for global path planning.
- L (Landmarks): Semantic landmark information is extracted by Astra-Global from visual data at each node, enriching the map’s semantic understanding. These landmarks store semantic attributes and are connected to multiple nodes via co-visibility relationships.
In practical localization, Astra-Global’s self-localization and target localization capabilities leverage a coarse-to-fine two-stage process for visual-language localization. The coarse stage analyzes input images and localization prompts, detects landmarks, establishes correspondence with a pre-built landmark map, and filters candidates based on visual consistency. The fine stage then uses the query image and coarse output to sample reference map nodes from the offline map, comparing their visual and positional information to directly output the predicted pose.
For language-based target localization, the model interprets natural language instructions, identifies relevant landmarks using their functional descriptions within the map, and then leverages landmark-to-node association mechanisms to locate relevant nodes, retrieving target images and 6-DoF poses.
To empower Astra-Global with robust localization abilities, the team employed a meticulous training methodology. Using Qwen2.5-VL as the backbone, they combined Supervised Fine-Tuning (SFT) with Group Relative Policy Optimization (GRPO). SFT involved diverse datasets for various tasks, including coarse and fine localization, co-visibility detection, and motion trend estimation. In the GRPO phase, a rule-based reward function (including format, landmark extraction, map matching, and extra landmark rewards) was used to train for visual-language localization. Experiments showed GRPO significantly improved Astra-Global’s zero-shot generalization, achieving 99.9% localization accuracy in unseen home environments, surpassing SFT-only methods.
Astra-Local: The Intelligent Assistant for Local Planning
Astra-Local acts as the intelligent assistant for Astra’s high-frequency tasks, a multi-task network capable of efficiently generating local paths and accurately estimating odometry from sensor data. Its architecture comprises three core components: a 4D spatio-temporal encoder, a planning head, and an odometry head.
The 4D spatio-temporal encoder replaces traditional mobile stack perception and prediction modules. It begins with a 3D spatial encoder that processes N omnidirectional images through a Vision Transformer (ViT) and Lift-Splat-Shoot to convert 2D image features into 3D voxel features. This 3D encoder is trained using self-supervised learning via 3D volumetric differentiable neural rendering. The 4D spatio-temporal encoder then builds upon the 3D encoder, taking past voxel features and future timestamps as input to predict future voxel features through ResNet and DiT modules, providing current and future environmental representations for planning and odometry.
The planning head, based on pre-trained 4D features, robot speed, and task information, generates executable trajectories using Transformer-based flow matching. To prevent collisions, the planning head incorporates a masked ESDF loss (Euclidean Signed Distance Field). This loss calculates the ESDF of a 3D occupancy map and applies a 2D ground truth trajectory mask, significantly reducing collision rates. Experiments demonstrate its superior performance in collision rate and overall score on out-of-distribution (OOD) datasets compared to other methods.
The odometry head predicts the robot’s relative pose using current and past 4D features and additional sensor data (e.g., IMU, wheel data). It trains a Transformer model to fuse information from different sensors. Each sensor modality is processed by a specific tokenizer, combined with modality embeddings and temporal positional embeddi…
Konten dipersingkat otomatis.
π Sumber: syncedreview.com
π MAROKO133 Hot ai: Trumpβs Grip on Reality Questioned After He Shares and Then De
President Donald Trump shared a bizarre and clearly AI-generated video in a late Saturday evening Truth Social post.
The post, which was mysteriously deleted hours later, but can be accessed via an archived version, shows a fake Fox News clip featuring Trump’s daughter-in-law, Lara Trump.
In the clip, an AI-generated version of Lara Trump alleges that the president had “launched a historic new healthcare system,” featuring “medbeds,” an unhinged conspiracy theory about secret “beds” that can supposedly cure practically any ailment.
Perhaps even more bafflingly, a phony version of Trump proudly announces the venture from behind his desk in the Oval Office in the video, meaning the president is now sharing deepfakes of himself with no disclaimer that they’re not real.
“Every American will soon receive their own medbed card,” the fake Trump says in the clip. “With it, youβll have guaranteed access to our new hospitals led by the top doctors in the nation, equipped with the most advanced technology in the world.”
The actual Fox News, usually a die-hard ally of the president, quickly acknowledged that the segment “never aired on Fox News Channel or any other Fox News Media platforms.”
It’s a new low for the president, in other words, who has already garnered a reputation for posting all kinds of AI slop on social media, from pictures of fake sports cars and a litany of AI self-portraits to videos of bearded belly dancers in Gaza.
There are a finite number of possibilities for his latest video, none of them good. Did he somehow think the video was real? Did he know it was fake, but shared it anyway? Was he trying to fool his followers β or is he himself not clear on what’s real and what’s an AI-generated fake?
“If ‘medbed’ technology were real, it would be the greatest medical advance in generations,” tweeted Media Matters senior fellow Matthew Gertz. “Trump should have to explain why he suggested it was using the channel he makes major policy announcements, and why he deleted it after the fact.”
“How do you bring people back to a shared reality when those in power keep stringing them along?” asked ethnographer and conspiracy theories researcher Noelle Cook.
“Trump deleted his bizarre post featuring an AI video of him endorsing ‘medbeds,’ which raises the question of whether heβs so confused that he thought it was a real video of him talking,” journalist Aaron Rupar offered.
The “medbed” conspiracy theory has direct ties to QAnon, as CNN reports, purporting that a shadowy UFO program inside the government has successfully reverse-engineered alien healing technologies, which it’s for some reason keeping from the public. (It’s worth pointing out here that the person currently running the government is Trump, meaning that in the anti-logic of QAnon, he would be the one keeping the nonexistent medbeds out of the hands of patients who would benefit from them.)
Some QAnon adherents even believed that a “medbed” was used to keep John F. Kennedy alive over half a century after his assassination.
Trump has amplified unhinged QAnon conspiracy theories for years now, sharing hundreds of posts promoting the fringe group’s ideas.
The “medbed” conspiracy theory is particularly unfortunate, considering the sorry state of the US healthcare system, which has historically excluded those who cannot afford insurance, largely leaving them to their own fate.
Worse yet, the Trump administration has taken direct aim at federal government initiatives in real life, slashing over half a trillion dollars from Medicare, a health insurance program for people over the age of 65 and younger people with disabilities.
Without any form of social or medical safety net, Americans could certainly benefit from an entirely fictional and worse-than-snake-oil healing mat.
“Trump was accidentally promoting Universal Healthcare for all Americans, but then deleted it once he was told what he did,” one X user joked.
More on Trump and AI slop: Trump’s Biggest Fans Furious at His Embrace of AI
The post Trump’s Grip on Reality Questioned After He Shares and Then Deletes Bizarre AI-Generated Video appeared first on Futurism.
π Sumber: futurism.com
π€ Catatan MAROKO133
Artikel ini adalah rangkuman otomatis dari beberapa sumber terpercaya. Kami pilih topik yang sedang tren agar kamu selalu update tanpa ketinggalan.
β Update berikutnya dalam 30 menit β tema random menanti!