📌 MAROKO133 Hot ai: Researchers from PSU and Duke introduce “Multi-Agent Systems A
Share My Research is Synced’s column that welcomes scholars to share their own research breakthroughs with over 2M global AI enthusiasts. Beyond technological advances, Share My Research also calls for interesting stories behind the research and exciting research ideas.
Meet the author
Institutions: Penn State University, Duke University, Google DeepMind, University of Washington, Meta, Nanyang Technological University, and Oregon State University. The co-first authors are Shaokun Zhang of Penn State University and Ming Yin of Duke University.
In recent years, LLM Multi-Agent systems have garnered widespread attention for their collaborative approach to solving complex problems. However, it’s a common scenario for these systems to fail at a task despite a flurry of activity. This leaves developers with a critical question: which agent, at what point, was responsible for the failure? Sifting through vast interaction logs to pinpoint the root cause feels like finding a needle in a haystack—a time-consuming and labor-intensive effort.
This is a familiar frustration for developers. In increasingly complex Multi-Agent systems, failures are not only common but also incredibly difficult to diagnose due to the autonomous nature of agent collaboration and long information chains. Without a way to quickly identify the source of a failure, system iteration and optimization grind to a halt.
To address this challenge, researchers from Penn State University and Duke University, in collaboration with institutions including Google DeepMind, have introduced the novel research problem of “Automated Failure Attribution.” They have constructed the first benchmark dataset for this task, Who&When, and have developed and evaluated several automated attribution methods. This work not only highlights the complexity of the task but also paves a new path toward enhancing the reliability of LLM Multi-Agent systems.
The paper has been accepted as a Spotlight presentation at the top-tier machine learning conference, ICML 2025, and the code and dataset are now fully open-source.
Paper:https://arxiv.org/pdf/2505.00212
Code:https://github.com/mingyin1/Agents_Failure_Attribution
Dataset:https://huggingface.co/datasets/Kevin355/Who_and_When
Research Background and Challenges
LLM-driven Multi-Agent systems have demonstrated immense potential across many domains. However, these systems are fragile; errors by a single agent, misunderstandings between agents, or mistakes in information transmission can lead to the failure of the entire task.
Currently, when a system fails, developers are often left with manual and inefficient methods for debugging:
Manual Log Archaeology : Developers must manually review lengthy interaction logs to find the source of the problem.
Reliance on Expertise : The debugging process is highly dependent on the developer’s deep understanding of the system and the task at hand.
This “needle in a haystack” approach to debugging is not only inefficient but also severely hinders rapid system iteration and the improvement of system reliability. There is an urgent need for an automated, systematic method to pinpoint the cause of failures, effectively bridging the gap between “evaluation results” and “system improvement.”
Core Contributions
This paper makes several groundbreaking contributions to address the challenges above:
1. Defining a New Problem: The paper is the first to formalize “automated failure attribution” as a specific research task. This task is defined by identifying the
2. failure-responsible agent and the decisive error step that led to the task’s failure.
Constructing the First Benchmark Dataset: Who&When : This dataset includes a wide range of failure logs collected from 127 LLM Multi-Agent systems, which were either algorithmically generated or hand-crafted by experts to ensure realism and diversity. Each failure log is accompanied by fine-grained human annotations for:
Who: The agent responsible for the failure.
When: The specific interaction step where the decisive error occurred.
Why: A natural language explanation of the cause of the failure.
3. Exploring Initial “Automated Attribution” Methods : Using the Who&When dataset, the paper designs and assesses three distinct methods for automated failure attribution:
All-at-Once: This method provides the LLM with the user query and the complete failure log, asking it to identify the responsible agent and the decisive error step in a single pass. While cost-effective, it may struggle to pinpoint precise errors in long contexts.
Step-by-Step: This approach mimics manual debugging by having the LLM review the interaction log sequentially, making a judgment at each step until the error is found. It is more precise at locating the error step but incurs higher costs and risks accumulating errors.
Binary Search: A compromise between the first two methods, this strategy repeatedly divides the log in half, using the LLM to determine which segment contains the error. It then recursively searches the identified segment, offering a balance of cost and performance.
Experimental Results and Key Findings
Experiments were conducted in two settings: one where the LLM knows the ground truth answer to the problem the Multi-Agent system is trying to solve (With Ground Truth) and one where it does not (Without Ground Truth). The primary model used was GPT-4o, though other models were also tested. The systematic evaluation of these methods on the Who&When dataset yielded several important insights:
- A Long Way to Go: Current methods are far from perfect. Even the best-performing single method achieved an accuracy of only about 53.5% in identifying the responsible agent and a mere 14.2% in pinpointing the exact error step. Some methods performed even worse than random guessing, underscoring the difficulty of the task.
- No “All-in-One” Solution: Different methods excel at different aspects of the problem. The All-at-Once method is better at identifying “Who,” while the Step-by-Step method is more effective at determining “When.” The Binary Search method provides a middle-ground performance.
- Hybrid Approaches Show Promise but at a High Cost: The researchers found that combining different methods, such as using the All-at-Once approach to identify a potential agent and then applying the Step-by-Step method to find the error, can improve overall performance. However, this comes with a significant increase in computational cost.
- State-of-the-Art Models Struggle: Surprisingly, even the most advanced reasoning m…
Konten dipersingkat otomatis.
🔗 Sumber: syncedreview.com
📌 MAROKO133 Update ai: ChatGPT Group Chats are here … but not for everyone (yet) E
It was originally found in leaked code and publicized by AI influencers on X, but OpenAI has made it official: ChatGPT now offers Group Chats, allowing multiple users to join the same, single ChatGPT conversation and send messages to each other and the underlying large language model (LLM), online and via its mobile apps.
Imagine adding ChatGPT as another member of your existing group chats, allowing you to text it as you would one of your friends or family members and have them respond as well, and you'll have an idea of the intriguing power and potential of this feature.
However, the feature is only available as a limited pilot for now to ChatGPT users in Japan, New Zealand, South Korea, and Taiwan (all tiers, including free usage).
“Group chats are just the beginning of ChatGPT becoming a shared space to collaborate and interact with others,” OpenAI wrote in its announcement.
This development builds on internal experimentation at OpenAI, where technical staffer Keyan Zhang said in a post on X that OpenAI's team initially considered multiplayer ChatGPT to be “a wild, out-of-distribution idea.”
According to Zhang, the model’s performance in those early tests demonstrated far more potential than existing interfaces typically allow.
The move follows OpenAI investor yet competitor Microsoft's update of its Copilot AI assistant to allow group chats last month, as well as Anthropic's introduction of shareable context and chat histories from its Claude AI models through its Projects feature introduced summer 2024, though this is not a simultaneous, realtime group chat in the same way.
Collaborative functionality integrated into ChatGPT
Group chats function as shared conversational spaces where users can plan events, brainstorm ideas, or collaborate on projects with the added support of ChatGPT.
These conversations are distinct from individual chats and are excluded from ChatGPT’s memory system—meaning no data from these group threads is used to train or personalize future interactions.
Users can initiate a group chat by selecting the people icon in a new or existing conversation. Adding others creates a copy of the original thread, preserving the source dialogue. Participants can join via a shareable link and are prompted to create a profile with a name, username, and photo. The feature supports 1 to 20 participants per group.
Each group chat is listed in a new section of the ChatGPT interface, and users can manage settings like naming the group, adding or removing participants, or muting notifications.
Powered by GPT-5.1 with expanded tools
The new group chat feature runs on GPT-5.1 Auto, a backend setting that chooses the optimal model based on the user’s subscription tier and the prompt.
Functionality such as search, image generation, file upload, and dictation is available inside group conversations.
Importantly, the system applies rate limits only when ChatGPT is producing responses. Direct messages between human users in the group do not count toward any plan’s message cap.
OpenAI has added new social features to ChatGPT in support of this group dynamic. The model can react with emojis, interpret conversational context to decide when to respond, and personalize generated content using members’ profile photos—such as inserting user likenesses into images when asked.
Privacy by default, controls for younger users
OpenAI emphasized that privacy and user control are integral to group chat design. The feature operates independently of the user’s personalized ChatGPT memory, and no new memories are created from these interactions.
Participation requires an invitation link, and members are always able to see who is in a chat or leave at any time.
Users under the age of 18 are automatically shielded from sensitive content in group chats. Parents or guardians can disable group chat access altogether via built-in parental controls.
Group creators retain special permissions, including immunity from being removed by others. All other participants can be added or removed by group members.
A testbed for shared AI experiences
OpenAI frames group chats as an early step toward richer, multi-user applications of AI, hinting at broader ambitions for ChatGPT as a shared workspace. The company expects to expand access over time and refine the feature based on how early users engage with it.
Keyan Zhang’s post suggests that the underlying model capabilities are far ahead of the interfaces users currently interact with. This pilot, in OpenAI’s view, offers a new “container” where more of the model’s latent capacity can be surfaced.
“Our models have a lot more room to shine than today’s experiences show, and the current containers only use a fraction of their capabilities,” Zhang said.
With this initial pilot focused on a limited set of markets, OpenAI is likely monitoring both usage patterns and cultural fit as it plans for broader deployment. For now, the group chat experiment offers a new way for users to interact with ChatGPT—and with each other—in real time, using a conversational interface that blends productivity and personalization.
Developer access: Still unclear
OpenAI has not provided any indication that Group Chats will be accessible via the API or SDK. The current rollout is framed strictly within the ChatGPT product environment, with no mention of tool calls, developer hooks, or integration support for programmatic use. This absence of signaling leaves it unclear whether the company views group interaction as a future developer primitive or as a contained UX feature for end users only.
For enterprise teams exploring how to replicate multi-user collaboration with generative models, any current implementation would require custom orchestration—such as managing multi-party context and prompts across separate API calls, and handling session state and response merging externally. Until OpenAI provides formal support, Group Chats remain a closed interface feature rather than a developer-accessible capability.
Here is a standalone concluding subsection tailored for the article, focusing on what the ChatGPT Group Chat rollout means for enterprise decision makers in both pilot regions and globally:
Implications for enterprise AI and data leaders
For enterprise teams already leveraging AI platforms—or preparing to—OpenAI’s group chat feature introduces a new layer of multi-user collaboration that could shift how generative models are deployed across workflows. While the pilot is limited to users in Japan, New Zealand, South Korea, and Taiwan, its design and roadmap offer key signals for AI engineers, orchestration specialists, and data leads globally.
AI engineers managing large language model (LLM) deployments can now begin to conceptualize real-time, multi-user interfaces not just as support tools, but as collaborative environments for research, content generation, and ideation. This adds another front in model tuning: not just how models respond to individuals, but how they behave in live group settings with context shifts and varied user intentions.
For AI orchestration leads, the ability to integrate ChatGPT into collaborative flows without exposing private memory or requiring custom builds may reduce …
Konten dipersingkat otomatis.
🔗 Sumber: venturebeat.com
🤖 Catatan MAROKO133
Artikel ini adalah rangkuman otomatis dari beberapa sumber terpercaya. Kami pilih topik yang sedang tren agar kamu selalu update tanpa ketinggalan.
✅ Update berikutnya dalam 30 menit — tema random menanti!