📌 MAROKO133 Eksklusif ai: Booking.com’s agent strategy: Disciplined, modular and a
When many enterprises weren’t even thinking about agentic behaviors or infrastructures, Booking.com had already “stumbled” into them with its homegrown conversational recommendation system.
This early experimentation has allowed the company to take a step back and avoid getting swept up in the frantic AI agent hype. Instead, it is taking a disciplined, layered, modular approach to model development: small, travel-specific models for cheap, fast inference; larger large language models (LLMs) for reasoning and understanding; and domain-tuned evaluations built in-house when precision is critical.
With this hybrid strategy — combined with selective collaboration with OpenAI — Booking.com has seen accuracy double across key retrieval, ranking and customer-interaction tasks.
As Pranav Pathak, Booking.com’s AI product development lead, posed to VentureBeat in a new podcast: “Do you build it very, very specialized and bespoke and then have an army of a hundred agents? Or do you keep it general enough and have five agents that are good at generalized tasks, but then you have to orchestrate a lot around them? That's a balance that I think we're still trying to figure out, as is the rest of the industry.”
Check out the new Beyond the Pilot podcast here, and continue reading for highlights.
Moving from guessing to deep personalization without being ‘creepy’
Recommendation systems are core to Booking.com’s customer-facing platforms; however, traditional recommendation tools have been less about recommendation and more about guessing, Pathak conceded. So, from the start, he and his team vowed to avoid generic tools: As he put it, the price and recommendation should be based on customer context.
Booking.com’s initial pre-gen AI tooling for intent and topic detection was a small language model, what Pathak described as “the scale and size of BERT.” The model ingested the customer’s inputs around their problem to determine whether it could be solved through self-service or bumped to a human agent.
“We started with an architecture of ‘you have to call a tool if this is the intent you detect and this is how you've parsed the structure,” Pathak explained. “That was very, very similar to the first few agentic architectures that came out in terms of reason and defining a tool call.”
His team has since built out that architecture to include an LLM orchestrator that classifies queries, triggers retrieval-augmented generation (RAG) and calls APIs or smaller, specialized language models. “We've been able to scale that system quite well because it was so close in architecture that, with a few tweaks, we now have a full agentic stack,” said Pathak.
As a result, Booking.com is seeing a 2X increase in topic detection, which in turn is freeing up human agents’ bandwidth by 1.5 to 1.7X. More topics, even complicated ones previously identified as ‘other’ and requiring escalation, are being automated.
Ultimately, this supports more self-service, freeing human agents to focus on customers with uniquely-specific problems that the platform doesn’t have a dedicated tool flow for — say, a family that is unable to access its hotel room at 2 a.m. when the front desk is closed.
That not only “really starts to compound,” but has a direct, long-term impact on customer retention, Pathak noted. “One of the things we've seen is, the better we are at customer service, the more loyal our customers are.”
Another recent rollout is personalized filtering. Booking.com has between 200 and 250 search filters on its website — an unrealistic amount for any human to sift through, Pathak pointed out. So, his team introduced a free text box that users can type into to immediately receive tailored filters.
“That becomes such an important cue for personalization in terms of what you're looking for in your own words rather than a clickstream,” said Pathak.
In turn, it cues Booking.com into what customers actually want. For instance, hot tubs — when filter personalization first rolled out, jacuzzi’s were one of the most popular requests. That wasn’t even a consideration previously; there wasn’t even a filter. Now that filter is live.
“I had no idea,” Pathak noted. “I had never searched for a hot tub in my room honestly.”
When it comes to personalization, though, there is a fine line; memory remains complicated, Pathak emphasized. While it’s important to have long-term memories and evolving threads with customers — retaining information like their typical budgets, preferred hotel star ratings or whether they need disability access — it must be on their terms and protective of their privacy.
Booking.com is extremely mindful with memory, seeking consent so as to not be “creepy” when collecting customer information.
“Managing memory is much harder than actually building memory,” said Pathak. “The tech is out there, we have the technical chops to build it. We want to make sure we don't launch a memory object that doesn't respect customer consent, that doesn't feel very natural.”
Finding a balance of build versus buy
As agents mature, Booking.com is navigating a central question facing the entire industry: How narrow should agents become?
Instead of committing to either a swarm of highly specialized agents or a few generalized ones, the company aims for reversible decisions and avoids “one-way doors” that lock its architecture into long-term, costly paths. Pathak’s strategy is: Generalize where possible, specialize where necessary and keep agent design flexible to help ensure resiliency.
Pathak and his team are “very mindful” of use cases, evaluating where to build more generalized, reusable agents or more task-specific ones. They strive to use the smallest model possible, with the highest level of accuracy and output quality, for each use case. Whatever can be generalized is.
Latency is another important consideration. When factual accuracy and avoiding hallucinations is paramount, his team will use a larger, much slower model; but with search and recommendations, user expectations set speed. (Pathak noted: “No one’s patient.”)
“We would, for example, never use something as heavy as GPT-5 for just topic detection or for entity extraction,” he said.
Booking.com takes a similarly elastic tack when it comes to monitoring and evaluations: If it's general-purpose monitoring that someone else is better at building and has horizontal capability, they’ll buy it. But if it’s instances where brand guidelines must be enforced, they’ll build their own evals.
Ultimately, Booking.com has leaned into being “super anticipatory,” agile and flexible. “At this point with everything that's happening with AI, we are a little bit averse to walking through one way doors,” said Pathak. “We want as many of our decisions to be reversible as possible. We don't want to get locked into a decision that we cannot reverse two years from now.”
What other builders can learn from Booking.com’s AI journey
Booking.com’s AI journey can serve as an important blueprint for other enterprises.
Looking back, Pathak acknowledged that they started out with a “pretty complicated” tech stack. They’re now in a good place with that, “but we probably could have started something much simpler and seen how customers interacted with it.”
Given that, he offered this valuable advice: If you’re just starting out with LLMs or agents, out-of-the-box APIs will do just fine. “There's enough customization with APIs that you can already get a lot of leverage before you decide you want to go do more.”
On the other hand, if a use case requires customization not available through a standard API call, that makes a case for in-house tools.
Still, he emphasized: Don't start with the complicated stuff. Tackle the “simplest, most pain…
Konten dipersingkat otomatis.
🔗 Sumber: venturebeat.com
📌 MAROKO133 Breaking ai: AI “Research” Papers Are Complete Slop, Experts Say Wajib
There’s sloppy science, and there’s AI slop science.
In an ironic twist of fate, beleaguered AI researchers are warning that the field is being choked by a deluge of shoddy academic papers written with large language models, making it harder than ever for high quality work to be discovered and stand out.
Part of the problem is that AI research has surged in popularity. The more people who jump on the wagon, the more some are trying to speedrun an academic reputation by churning out dozens — and sometimes even hundreds — of papers a year, giving the entire pursuit a bad name.
In an interview with The Guardian, professor of computer science at UC Berkeley Hany Farid called the state of affairs a “frenzy.” With so much slop rising to the top, he says he now advises his students not to enter the field.
“So many young people want to get into AI,” Farid told The Guardian. “It’s just a mess. You can’t keep up, you can’t publish, you can’t do good work, you can’t be thoughtful.”
Farid stirred debate over the topic by calling out the output of an AI researcher named Kevin Zhu, who claims to have published 113 papers on AI this year.
“I can’t carefully read 100 technical papers a year,” Farid wrote in a LinkedIn post last month, “so imagine my surprise when I learned about one author who claims to have participated in the research and writing of over 100 technical papers in a year.”
Zhu, who recently received his bachelor’s in computer science at UC Berkeley — the same place that Farid teaches —launched an AI researcher program aimed at high schoolers and college students called Algoverse. Many of its participants are coauthors on Zhu’s papers, The Guardian noted. Each student pays $3,325 for a 12-week online course, during which they’re expected to submit work to AI conferences.
One of those conferences is NeurIPS, which is considered to be one of the big three conferences in a field that was once obscure but is now the center of attention as AI commands immense investment and social cachet. In 2020 it fielded less than 10,000 papers, according to The Guardian. This year, that number has jumped to over 21,500, a trend shared by other major AI conferences. The explosion has been so extreme that NeurIPS is now relying on PhD students to help review its flood of submissions.
The overwhelming volume is thanks to people like Zhu: 89 of his over a century of papers are being presented at NeurIPS this week.
Farid called Zhu’s papers a “disaster,” and added that he “could not have possibly meaningfully contributed” to them.
“I’m fairly convinced that the whole thing, top to bottom, is just vibe coding,” Farid said using the new slang that’s emerged to describe using AI tools to quickly build software, exemplifying the attitude of reckless abandon that the new crop of AI-dependent programmers are taking to the practice.
Zhu would not confirm or deny whether his papers were written with AI when asked by The Guardian, but said his teams used “standard productivity tools such as reference managers, spellcheck, and sometimes language models for copy-editing or improving clarity.”
The role that AI has rapidly carved out in academic research has been a point of controversy ever since it first surged in popularity several years ago. Tools like ChatGPT are still prone to hallucinating citations, or inventing sources that do not exist, which often sneak through the peer review process of even prestigious journals. Other instances, such as when a peer-reviewed paper used an AI-generated diagram of a mouse with impossibly super-sized genitalia, make you question if there’s any oversight at all. The tech is so entrenched in academia that some clever authors are inserting hidden text into their papers designed to trick “reviewers” that are themselves AI-powered into giving positive assessments of their work.
What’s particularly disconcerting to hear now, however, is how AI research is beginning to be torn apart by the technology itself. How long can the pursuit survive its own product? And what does that mean for the upcoming generation of AI scientists, if novel research is being drowned out by their far more prolific peers that are churning out studies with fabricated sources?
Even a seasoned vet like Farid says it’s now makes it impossible to keep track of what’s happening in the AI field.
“You have no chance, no chance as an average reader to try to understand what is going on in the scientific literature,” Farid told The Guardian. “Your signal-to-noise ratio is basically one. I can barely go to these conferences and figure out what the hell is going on.”
More on AI: AI Researchers Say They’ve Invented Incantations Too Dangerous to Release to the Public
The post AI “Research” Papers Are Complete Slop, Experts Say appeared first on Futurism.
🔗 Sumber: futurism.com
🤖 Catatan MAROKO133
Artikel ini adalah rangkuman otomatis dari beberapa sumber terpercaya. Kami pilih topik yang sedang tren agar kamu selalu update tanpa ketinggalan.
✅ Update berikutnya dalam 30 menit — tema random menanti!