MAROKO133 Hot ai: Meta returns to open source AI with Omnilingual ASR models that can tran

📌 MAROKO133 Eksklusif ai: Meta returns to open source AI with Omnilingual ASR mode

Meta has just released a new multilingual automatic speech recognition (ASR) system supporting 1,600+ languages — dwarfing OpenAI’s open source Whisper model, which supports just 99.

Is architecture also allows developers to extend that support to thousands more. Through a feature called zero-shot in-context learning, users can provide a few paired examples of audio and text in a new language at inference time, enabling the model to transcribe additional utterances in that language without any retraining.

In practice, this expands potential coverage to more than 5,400 languages — roughly every spoken language with a known script.

It’s a shift from static model capabilities to a flexible framework that communities can adapt themselves. So while the 1,600 languages reflect official training coverage, the broader figure represents Omnilingual ASR’s capacity to generalize on demand, making it the most extensible speech recognition system released to date.

Best of all: it's been open sourced under a plain Apache 2.0 license — not a restrictive, quasi open-source Llama license like the company's prior releases, which limited use by larger enterprises unless they paid licensing fees — meaning researchers and developers are free to take and implement it right away, for free, without restrictions, even in commercial and enterprise-grade projects!

Released on November 10 on Meta's website, Github, along with a demo space on Hugging Face and technical paper, Meta’s Omnilingual ASR suite includes a family of speech recognition models, a 7-billion parameter multilingual audio representation model, and a massive speech corpus spanning over 350 previously underserved languages.

All resources are freely available under open licenses, and the models support speech-to-text transcription out of the box.

“By open sourcing these models and dataset, we aim to break down language barriers, expand digital access, and empower communities worldwide,” Meta posted on its @AIatMeta account on X

Designed for Speech-to-Text Transcription

At its core, Omnilingual ASR is a speech-to-text system.

The models are trained to convert spoken language into written text, supporting applications like voice assistants, transcription tools, subtitles, oral archive digitization, and accessibility features for low-resource languages.

Unlike earlier ASR models that required extensive labeled training data, Omnilingual ASR includes a zero-shot variant.

This version can transcribe languages it has never seen before—using just a few paired examples of audio and corresponding text.

This lowers the barrier for adding new or endangered languages dramatically, removing the need for large corpora or retraining.

Model Family and Technical Design

The Omnilingual ASR suite includes multiple model families trained on more than 4.3 million hours of audio from 1,600+ languages:

  • wav2vec 2.0 models for self-supervised speech representation learning (300M–7B parameters)

  • CTC-based ASR models for efficient supervised transcription

  • LLM-ASR models combining a speech encoder with a Transformer-based text decoder for state-of-the-art transcription

  • LLM-ZeroShot ASR model, enabling inference-time adaptation to unseen languages

All models follow an encoder–decoder design: raw audio is converted into a language-agnostic representation, then decoded into written text.

Why the Scale Matters

While Whisper and similar models have advanced ASR capabilities for global languages, they fall short on the long tail of human linguistic diversity. Whisper supports 99 languages. Meta’s system:

  • Directly supports 1,600+ languages

  • Can generalize to 5,400+ languages using in-context learning

  • Achieves character error rates (CER) under 10% in 78% of supported languages

Among those supported are more than 500 languages never previously covered by any ASR model, according to Meta’s research paper.

This expansion opens new possibilities for communities whose languages are often excluded from digital tools

Here’s the revised and expanded background section, integrating the broader context of Meta’s 2025 AI strategy, leadership changes, and Llama 4’s reception, complete with in-text citations and links:

Background: Meta’s AI Overhaul and a Rebound from Llama 4

The release of Omnilingual ASR arrives at a pivotal moment in Meta’s AI strategy, following a year marked by organizational turbulence, leadership changes, and uneven product execution.

Omnilingual ASR is the first major open-source model release since the rollout of Llama 4, Meta’s latest large language model, which debuted in April 2025 to mixed and ultimately poor reviews, with scant enterprise adoption compared to Chinese open source model competitors.

The failure led Meta founder and CEO Mark Zuckerberg to appoint Alexandr Wang, co-founder and prior CEO of AI data supplier Scale AI, as Chief AI Officer, and embark on an extensive and costly hiring spree that shocked the AI and business communities with eye-watering pay packages for top AI researchers.

In contrast, Omnilingual ASR represents a strategic and reputational reset. It returns Meta to a domain where the company has historically led — multilingual AI — and offers a truly extensible, community-oriented stack with minimal barriers to entry.

The system’s support for 1,600+ languages and its extensibility to over 5,000 more via zero-shot in-context learning reassert Meta’s engineering credibility in language technology.

Importantly, it does so through a free and permissively licensed release, under Apache 2.0, with transparent dataset sourcing and reproducible training protocols.

This shift aligns with broader themes in Meta’s 2025 strategy. The company has refocused its narrative around a “personal superintelligence” vision, investing heavily in infrastructure (including a September release of custom AI accelerators and Arm-based inference stacks) source while downplaying the metaverse in favor of foundational AI capabilities. The return to public training data in Europe after a regulatory pause also underscores its intention to compete globally, despite privacy scrutiny source.

Omnilingual ASR, then, is more than a model release — it’s a calculated move to reassert control of the narrative: from the fragmented rollout of Llama 4 to a high-utility, research-grounded contribution that …

Konten dipersingkat otomatis.

🔗 Sumber: venturebeat.com


📌 MAROKO133 Breaking ai: Meta returns to open source AI with Omnilingual ASR model

Meta has just released a new multilingual automatic speech recognition (ASR) system supporting 1,600+ languages — dwarfing OpenAI’s open source Whisper model, which supports just 99.

Is architecture also allows developers to extend that support to thousands more. Through a feature called zero-shot in-context learning, users can provide a few paired examples of audio and text in a new language at inference time, enabling the model to transcribe additional utterances in that language without any retraining.

In practice, this expands potential coverage to more than 5,400 languages — roughly every spoken language with a known script.

It’s a shift from static model capabilities to a flexible framework that communities can adapt themselves. So while the 1,600 languages reflect official training coverage, the broader figure represents Omnilingual ASR’s capacity to generalize on demand, making it the most extensible speech recognition system released to date.

Best of all: it's been open sourced under a plain Apache 2.0 license — not a restrictive, quasi open-source Llama license like the company's prior releases, which limited use by larger enterprises unless they paid licensing fees — meaning researchers and developers are free to take and implement it right away, for free, without restrictions, even in commercial and enterprise-grade projects!

Released on November 10 on Meta's website, Github, along with a demo space on Hugging Face and technical paper, Meta’s Omnilingual ASR suite includes a family of speech recognition models, a 7-billion parameter multilingual audio representation model, and a massive speech corpus spanning over 350 previously underserved languages.

All resources are freely available under open licenses, and the models support speech-to-text transcription out of the box.

“By open sourcing these models and dataset, we aim to break down language barriers, expand digital access, and empower communities worldwide,” Meta posted on its @AIatMeta account on X

Designed for Speech-to-Text Transcription

At its core, Omnilingual ASR is a speech-to-text system.

The models are trained to convert spoken language into written text, supporting applications like voice assistants, transcription tools, subtitles, oral archive digitization, and accessibility features for low-resource languages.

Unlike earlier ASR models that required extensive labeled training data, Omnilingual ASR includes a zero-shot variant.

This version can transcribe languages it has never seen before—using just a few paired examples of audio and corresponding text.

This lowers the barrier for adding new or endangered languages dramatically, removing the need for large corpora or retraining.

Model Family and Technical Design

The Omnilingual ASR suite includes multiple model families trained on more than 4.3 million hours of audio from 1,600+ languages:

  • wav2vec 2.0 models for self-supervised speech representation learning (300M–7B parameters)

  • CTC-based ASR models for efficient supervised transcription

  • LLM-ASR models combining a speech encoder with a Transformer-based text decoder for state-of-the-art transcription

  • LLM-ZeroShot ASR model, enabling inference-time adaptation to unseen languages

All models follow an encoder–decoder design: raw audio is converted into a language-agnostic representation, then decoded into written text.

Why the Scale Matters

While Whisper and similar models have advanced ASR capabilities for global languages, they fall short on the long tail of human linguistic diversity. Whisper supports 99 languages. Meta’s system:

  • Directly supports 1,600+ languages

  • Can generalize to 5,400+ languages using in-context learning

  • Achieves character error rates (CER) under 10% in 78% of supported languages

Among those supported are more than 500 languages never previously covered by any ASR model, according to Meta’s research paper.

This expansion opens new possibilities for communities whose languages are often excluded from digital tools

Here’s the revised and expanded background section, integrating the broader context of Meta’s 2025 AI strategy, leadership changes, and Llama 4’s reception, complete with in-text citations and links:

Background: Meta’s AI Overhaul and a Rebound from Llama 4

The release of Omnilingual ASR arrives at a pivotal moment in Meta’s AI strategy, following a year marked by organizational turbulence, leadership changes, and uneven product execution.

Omnilingual ASR is the first major open-source model release since the rollout of Llama 4, Meta’s latest large language model, which debuted in April 2025 to mixed and ultimately poor reviews, with scant enterprise adoption compared to Chinese open source model competitors.

The failure led Meta founder and CEO Mark Zuckerberg to appoint Alexandr Wang, co-founder and prior CEO of AI data supplier Scale AI, as Chief AI Officer, and embark on an extensive and costly hiring spree that shocked the AI and business communities with eye-watering pay packages for top AI researchers.

In contrast, Omnilingual ASR represents a strategic and reputational reset. It returns Meta to a domain where the company has historically led — multilingual AI — and offers a truly extensible, community-oriented stack with minimal barriers to entry.

The system’s support for 1,600+ languages and its extensibility to over 5,000 more via zero-shot in-context learning reassert Meta’s engineering credibility in language technology.

Importantly, it does so through a free and permissively licensed release, under Apache 2.0, with transparent dataset sourcing and reproducible training protocols.

This shift aligns with broader themes in Meta’s 2025 strategy. The company has refocused its narrative around a “personal superintelligence” vision, investing heavily in infrastructure (including a September release of custom AI accelerators and Arm-based inference stacks) source while downplaying the metaverse in favor of foundational AI capabilities. The return to public training data in Europe after a regulatory pause also underscores its intention to compete globally, despite privacy scrutiny source.

Omnilingual ASR, then, is more than a model release — it’s a calculated move to reassert control of the narrative: from the fragmented rollout of Llama 4 to a high-utility, research-grounded contribution that …

Konten dipersingkat otomatis.

🔗 Sumber: venturebeat.com


🤖 Catatan MAROKO133

Artikel ini adalah rangkuman otomatis dari beberapa sumber terpercaya. Kami pilih topik yang sedang tren agar kamu selalu update tanpa ketinggalan.

✅ Update berikutnya dalam 30 menit — tema random menanti!

Author: timuna