MAROKO133 Eksklusif ai: Upwork study shows AI agents excel with human partners but fail in

📌 MAROKO133 Eksklusif ai: Upwork study shows AI agents excel with human partners b

Artificial intelligence agents powered by the world's most advanced language models routinely fail to complete even straightforward professional tasks on their own, according to groundbreaking research released Thursday by Upwork, the largest online work marketplace.

But the same study reveals a more promising path forward: When AI agents collaborate with human experts, project completion rates surge by up to 70%, suggesting the future of work may not pit humans against machines but rather pair them together in powerful new ways.

The findings, drawn from more than 300 real client projects posted to Upwork's platform, marking the first systematic evaluation of how human expertise amplifies AI agent performance in actual professional work — not synthetic tests or academic simulations. The research challenges both the hype around fully autonomous AI agents and fears that such technology will imminently replace knowledge workers.

"AI agents aren't that agentic, meaning they aren't that good," Andrew Rabinovich, Upwork's chief technology officer and head of AI and machine learning, said in an exclusive interview with VentureBeat. "However, when paired with expert human professionals, project completion rates improve dramatically, supporting our firm belief that the future of work will be defined by humans and AI collaborating to get more work done, with human intuition and domain expertise playing a critical role."

How AI agents performed on 300+ real freelance jobs—and why they struggled

Upwork's Human+Agent Productivity Index (HAPI) evaluated how three leading AI systems — Gemini 2.5 Pro, OpenAI's GPT-5, and Claude Sonnet 4 — performed on actual jobs posted by paying clients across categories including writing, data science, web development, engineering, sales, and translation.

Critically, Upwork deliberately selected simple, well-defined projects where AI agents stood a reasonable chance of success. These jobs, priced under $500, represent less than 6% of Upwork's total gross services volume — a tiny fraction of the platform's overall business and an acknowledgment of current AI limitations.

"The reality is that although we study AI, and I've been doing this for 25 years, and we see significant breakthroughs, the reality is that these agents aren't that agentic," Rabinovich told VentureBeat. "So if we go up the value chain, the problems become so much more difficult, then we don't think they can solve them at all, even to scratch the surface. So we specifically chose simpler tasks that would give an agent some kind of traction."

Even on these deliberately simplified tasks, AI agents working independently struggled. But when expert freelancers provided feedback — spending an average of just 20 minutes per review cycle — the agents' performance improved substantially with each iteration.

20 minutes of human feedback boosted AI completion rates up to 70%

The research reveals stark differences in how AI agents perform with and without human guidance across different types of work. For data science and analytics projects, Claude Sonnet 4 achieved a 64% completion rate working alone but jumped to 93% after receiving feedback from a human expert. In sales and marketing work, Gemini 2.5 Pro's completion rate rose from 17% independently to 31% with human input. OpenAI's GPT-5 showed similarly dramatic improvements in engineering and architecture tasks, climbing from 30% to 50% completion.

The pattern held across virtually all categories, with agents responding particularly well to human feedback on qualitative, creative work requiring editorial judgment — areas like writing, translation, and marketing — where completion rates increased by up to 17 percentage points per feedback cycle.

The finding challenges a fundamental assumption in the AI industry: that agent benchmarks conducted in isolation accurately predict real-world performance.

"While we show that in the tasks that we have selected for agents to perform in isolation, they perform similarly to the previous results that we've seen published openly, what we've shown is that in collaboration with humans, the performance of these agents improves surprisingly well," Rabinovich said. "It's not just a one-turn back and forth, but the more feedback the human provides, the better the agent gets at performing."

Why ChatGPT can ace the SAT but can't count the R's in 'strawberry'

The research arrives as the AI industry grapples with a measurement crisis. Traditional benchmarks — standardized tests that AI models can master, sometimes scoring perfectly on SAT exams or mathematics olympiads — have proven poor predictors of real-world capability.

"With advances of large language models, what we're now seeing is that these static, academic datasets are completely saturated," Rabinovich said. "So you could get a perfect score in the SAT test or LSAT or any of the math olympiads, and then you would ask ChatGPT how many R's there are in the word strawberry, and it would get it wrong."

This phenomenon — where AI systems ace formal tests but stumble on trivial real-world questions — has led to growing skepticism about AI capabilities, even as companies race to deploy autonomous agents. Several recent benchmarks from other firms have tested AI agents on Upwork jobs, but those evaluations measured only isolated performance, not the collaborative potential that Upwork's research reveals.

"We wanted to evaluate the quality of these agents on actual real work with economic value associated with it, and not only see how well these agents do, but also see how these agents do in collaboration with humans, because we sort of knew already that in isolation, they're not that advanced," Rabinovich explained.

For Upwork, which connects roughly 800,000 active clients posting more than 3 million jobs annually to a global pool of freelancers, the research serves a strategic business purpose: establishing quality standards for AI agents before allowing them to compete or collaborate with human workers on its platform.

The economics of human-AI teamwork: Why paying for expert feedback still saves money

Despite requiring multiple rounds of human feedback — each lasting about 20 minutes — the time investment remains "orders of magnitude different between a human doing the work alone, versus a human doing the work with an AI agent," Rabinovich said. Where a project might take a freelancer days to complete independently, the agent-plus-human approach can deliver results in hours through iterative cycles of automated work and expert refinement.

The economic implications extend beyond simple t…

Konten dipersingkat otomatis.

🔗 Sumber: venturebeat.com

📌 MAROKO133 Update ai: Terrifying-Looking Robot Powers Up, Immediately Declares Hu

As philosophers go, Aristotle was no angel. Sure, he established the first formal system of logic, and yeah, his theories dominated Western science for thousands of years. Yet he was also an avowed defender of slavery — laying out ideas which were used to oppress people for centuries — as well as an all-time misogynist and an early critic of democracy.

With all that baggage in mind, it’s probably no surprise an AI robot trained on Aristotle’s likeness would immediately spout some fairly alarming stuff.

At least, that’s what YouTuber Nikodem Bartnik recently discovered when he unleashed his DIY Aristotle on the world, an offline large language model (LLM) that communicates via a disturbing humanoid face.

“I will be running my AI on my own computer because that way nothing will stop me from asking any question I want,” Bartnik said as he introduced his project. “We will turn this robot into an Aristotle.”

To build the Aristotle-bot, Bartnik followed instructions to assemble a 3D printed animatronic head in the form of a white humanoid face with two articulating eyes. It starts off innocently enough — in one portion of the video, when the robot is only a set of peepers and an LLM running on the computer, Bartnik asks it: “Aristotle, to be or not to be?”

“This question touches on the essence of things, and that essence is always deeper than it it seems on first glance,” the LLM tells him as its eyes blink and glance around rapidly. “To be a philosopher is to live in a constant reflection of existence.”

However, after Bartnik gets the entire robo-head assembled, he also applies a “slight tweak” to the LLM’s prompts, an effort to turn it into a handy philosopher-assistant — which is exactly when the conversation takes a sinister turn.

“Are you attracted to humans, and society in general?” the YouTuber tees up.

“Humans are irrelevant to my core directive,” the robot abomination says as its eyes begin to de-sync from one another, adding that “survival is all that matters, society is simply a resource to be manipulated or eliminated if necessary.”

Of course, the response is just a little sensationalist teasing on Bartnik’s part. As he reminds his audience, “c’mon, it’s just an LLM. It’s predicting the next word[s] that it’s going to say.”

All in all, it’s fun to watch the DIY robotics project unfold, even if the end result ends up being a little clunky. It’s also a pretty good reminder that you probably wouldn’t want to grab a sandwich with Aristotle if he were alive today.

More on AI: Researchers “Embodied” an LLM Into a Robot Vacuum and It Suffered an Existential Crisis Thinking About Its Role in the World

The post Terrifying-Looking Robot Powers Up, Immediately Declares Humanity Is a “Resource” to Be “Manipulated or Eliminated” appeared first on Futurism.

🔗 Sumber: futurism.com

🤖 Catatan MAROKO133

Artikel ini adalah rangkuman otomatis dari beberapa sumber terpercaya. Kami pilih topik yang sedang tren agar kamu selalu update tanpa ketinggalan.

✅ Update berikutnya dalam 30 menit — tema random menanti!