📌 MAROKO133 Breaking ai: Anthropic launches enterprise ‘Agent Skills’ and opens th
Anthropic said on Wednesday it would release its Agent Skills technology as an open standard, a strategic bet that sharing its approach to making AI assistants more capable will cement the company's position in the fast-evolving enterprise software market.
The San Francisco-based artificial intelligence company also unveiled organization-wide management tools for enterprise customers and a directory of partner-built skills from companies including Atlassian, Figma, Canva, Stripe, Notion, and Zapier.
The moves mark a significant expansion of a technology Anthropic first introduced in October, transforming what began as a niche developer feature into infrastructure that now appears poised to become an industry standard.
"We're launching Agent Skills as an independent open standard with a specification and reference SDK available at https://agentskills.io," Mahesh Murag, a product manager at Anthropic, said in an interview with VentureBeat. "Microsoft has already adopted Agent Skills within VS Code and GitHub; so have popular coding agents like Cursor, Goose, Amp, OpenCode, and more. We're in active conversations with others across the ecosystem."
Inside the technology that teaches AI assistants to do specialized work
Skills are, at their core, folders containing instructions, scripts, and resources that tell AI systems how to perform specific tasks consistently. Rather than requiring users to craft elaborate prompts each time they want an AI assistant to complete a specialized task, skills package that procedural knowledge into reusable modules.
The concept addresses a fundamental limitation of large language models: while they possess broad general knowledge, they often lack the specific procedural expertise needed for specialized professional work. A skill for creating PowerPoint presentations, for instance, might include preferred formatting conventions, slide structure guidelines, and quality standards — information the AI loads only when working on presentations.
Anthropic designed the system around what it calls "progressive disclosure." Each skill takes only a few dozen tokens when summarized in the AI's context window, with full details loading only when the task requires them. This architectural choice allows organizations to deploy extensive skill libraries without overwhelming the AI's working memory.
Fortune 500 companies are already using skills in legal, finance, and accounting
The new enterprise management features allow administrators on Anthropic's Team and Enterprise plans to provision skills centrally, controlling which workflows are available across their organizations while letting individual employees customize their experience.
"Enterprise customers are using skills in production across both coding workflows and business functions like legal, finance, accounting, and data science," Murag said. "The feedback has been positive because skills let them personalize Claude to how they actually work and get to high-quality output faster."
The community response has exceeded expectations, according to Murag: "Our skills repository already crossed 20k stars on GitHub, with tens of thousands of community-created and shared skills."
Atlassian, Figma, Stripe, and Zapier join Anthropic's skills directory at launch
Anthropic is launching with skills from ten partners, a roster that reads like a who's who of modern enterprise software. The presence of Atlassian, which makes Jira and Confluence, alongside design tools Figma and Canva, payment infrastructure company Stripe, and automation platform Zapier suggests Anthropic is positioning Skills as connective tissue between Claude and the applications businesses already use.
The business arrangements with these partners focus on ecosystem development rather than immediate revenue generation.
"Partners who build skills for the directory do so to enhance how Claude works with their platforms. It's a mutually beneficial ecosystem relationship similar to MCP connector partnerships," Murag explained. "There are no revenue-sharing arrangements at this time."
For vetting new partners, Anthropic is taking a measured approach. "We began with established partners and are developing more formal criteria as we expand," Murag said. "We want to create a valuable supply of skills for enterprises while helping partner products shine."
Notably, Anthropic is not charging extra for the capability. "Skills work across all Claude surfaces: Claude.ai, Claude Code, the Claude Agent SDK, and the API. They're included in Max, Pro, Team, and Enterprise plans at no additional cost. API usage follows standard API pricing," Murag said.
Why Anthropic is giving away its competitive advantage to OpenAI and Google
The decision to release Skills as an open standard is a calculated strategic choice. By making skills portable across AI platforms, Anthropic is betting that ecosystem growth will benefit the company more than proprietary lock-in would.
The strategy appears to be working. OpenAI has quietly adopted structurally identical architecture in both ChatGPT and its Codex CLI tool. Developer Elias Judin discovered the implementation earlier this month, finding directories containing skill files that mirror Anthropic's specification—the same file naming conventions, the same metadata format, the same directory organization.
This convergence suggests the industry has found a common answer to a vexing question: how do you make AI assistants consistently good at specialized work without expensive model fine-tuning?
The timing aligns with broader standardization efforts in the AI industry. Anthropic donated its Model Context Protocol to the Linux Foundation on December 9, and both Anthropic and OpenAI co-founded the Agentic AI Foundation alongside Block. Google, Microsoft, and Amazon Web Services joined as members. The foundation will steward multiple open specifications, and Skills fit naturally into this standardization push.
"We've also seen how complementary skills and MCP servers are," Murag noted. "MCP provides secure connectivity to external software and data, while skills provide the procedural knowledge for using those tools effectively. Partners who've invested in strong MCP integrations were a natural starting point."
The AI industry abandons specialized agents in favor of one assistant that learns everything
The Skills approach is a philosophical shift in how the AI industry thinks about making AI assistants more capable. The traditional approach involved building s…
Konten dipersingkat otomatis.
🔗 Sumber: venturebeat.com
📌 MAROKO133 Hot ai: ByteDance Introduces Astra: A Dual-Model Architecture for Auto
The increasing integration of robots across various sectors, from industrial manufacturing to daily life, highlights a growing need for advanced navigation systems. However, contemporary robot navigation systems face significant challenges in diverse and complex indoor environments, exposing the limitations of traditional approaches. Addressing the fundamental questions of “Where am I?”, “Where am I going?”, and “How do I get there?”, ByteDance has developed Astra, an innovative dual-model architecture designed to overcome these traditional navigation bottlenecks and enable general-purpose mobile robots.
Traditional navigation systems typically consist of multiple, smaller, and often rule-based modules to handle the core challenges of target localization, self-localization, and path planning. Target localization involves understanding natural language or image cues to pinpoint a destination on a map. Self-localization requires a robot to determine its precise position within a map, especially challenging in repetitive environments like warehouses where traditional methods often rely on artificial landmarks (e.g., QR codes). Path planning further divides into global planning for rough route generation and local planning for real-time obstacle avoidance and reaching intermediate waypoints.
While foundation models have shown promise in integrating smaller models to tackle broader tasks, the optimal number of models and their effective integration for comprehensive navigation remained an open question.
ByteDance’s Astra, detailed in their paper “Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning” (website: https://astra-mobility.github.io/), addresses these limitations. Following the System 1/System 2 paradigm, Astra features two primary sub-models: Astra-Global and Astra-Local. Astra-Global handles low-frequency tasks like target and self-localization, while Astra-Local manages high-frequency tasks such as local path planning and odometry estimation. This architecture promises to revolutionize how robots navigate complex indoor spaces.
Astra-Global: The Intelligent Brain for Global Localization
Astra-Global serves as the intelligent core of the Astra architecture, responsible for critical low-frequency tasks: self-localization and target localization. It functions as a Multimodal Large Language Model (MLLM), adept at processing both visual and linguistic inputs to achieve precise global positioning within a map. Its strength lies in utilizing a hybrid topological-semantic graph as contextual input, allowing the model to accurately locate positions based on query images or text prompts.
The construction of this robust localization system begins with offline mapping. The research team developed an offline method to build a hybrid topological-semantic graph G=(V,E,L):
- V (Nodes): Keyframes, obtained by temporal downsampling of input video and SfM-estimated 6-Degrees-of-Freedom (DoF) camera poses, act as nodes encoding camera poses and landmark references.
- E (Edges): Undirected edges establish connectivity based on relative node poses, crucial for global path planning.
- L (Landmarks): Semantic landmark information is extracted by Astra-Global from visual data at each node, enriching the map’s semantic understanding. These landmarks store semantic attributes and are connected to multiple nodes via co-visibility relationships.
In practical localization, Astra-Global’s self-localization and target localization capabilities leverage a coarse-to-fine two-stage process for visual-language localization. The coarse stage analyzes input images and localization prompts, detects landmarks, establishes correspondence with a pre-built landmark map, and filters candidates based on visual consistency. The fine stage then uses the query image and coarse output to sample reference map nodes from the offline map, comparing their visual and positional information to directly output the predicted pose.
For language-based target localization, the model interprets natural language instructions, identifies relevant landmarks using their functional descriptions within the map, and then leverages landmark-to-node association mechanisms to locate relevant nodes, retrieving target images and 6-DoF poses.
To empower Astra-Global with robust localization abilities, the team employed a meticulous training methodology. Using Qwen2.5-VL as the backbone, they combined Supervised Fine-Tuning (SFT) with Group Relative Policy Optimization (GRPO). SFT involved diverse datasets for various tasks, including coarse and fine localization, co-visibility detection, and motion trend estimation. In the GRPO phase, a rule-based reward function (including format, landmark extraction, map matching, and extra landmark rewards) was used to train for visual-language localization. Experiments showed GRPO significantly improved Astra-Global’s zero-shot generalization, achieving 99.9% localization accuracy in unseen home environments, surpassing SFT-only methods.
Astra-Local: The Intelligent Assistant for Local Planning
Astra-Local acts as the intelligent assistant for Astra’s high-frequency tasks, a multi-task network capable of efficiently generating local paths and accurately estimating odometry from sensor data. Its architecture comprises three core components: a 4D spatio-temporal encoder, a planning head, and an odometry head.
The 4D spatio-temporal encoder replaces traditional mobile stack perception and prediction modules. It begins with a 3D spatial encoder that processes N omnidirectional images through a Vision Transformer (ViT) and Lift-Splat-Shoot to convert 2D image features into 3D voxel features. This 3D encoder is trained using self-supervised learning via 3D volumetric differentiable neural rendering. The 4D spatio-temporal encoder then builds upon the 3D encoder, taking past voxel features and future timestamps as input to predict future voxel features through ResNet and DiT modules, providing current and future environmental representations for planning and odometry.
The planning head, based on pre-trained 4D features, robot speed, and task information, generates executable trajectories using Transformer-based flow matching. To prevent collisions, the planning head incorporates a masked ESDF loss (Euclidean Signed Distance Field). This loss calculates the ESDF of a 3D occupancy map and applies a 2D ground truth trajectory mask, significantly reducing collision rates. Experiments demonstrate its superior performance in collision rate and overall score on out-of-distribution (OOD) datasets compared to other methods.
The odometry head predicts the robot’s relative pose using current and past 4D features and additional sensor data (e.g., IMU, wheel data). It trains a Transformer model to fuse information from different sensors. Each sensor modality is processed by a specific tokenizer, combined with modality embeddings and temporal positional embeddi…
Konten dipersingkat otomatis.
🔗 Sumber: syncedreview.com
🤖 Catatan MAROKO133
Artikel ini adalah rangkuman otomatis dari beberapa sumber terpercaya. Kami pilih topik yang sedang tren agar kamu selalu update tanpa ketinggalan.
✅ Update berikutnya dalam 30 menit — tema random menanti!