Liwaiwai Liwaiwai
  • /
  • Artificial Intelligence
  • Machine Learning
  • Robotics
  • Engineering
    • Architecture
    • Design
    • Software
    • Hybrid Cloud
    • Data
  • Learning
  • About
  • /
  • Artificial Intelligence
  • Machine Learning
  • Robotics
  • Engineering
    • Architecture
    • Design
    • Software
    • Hybrid Cloud
    • Data
  • Learning
  • About
Liwaiwai Liwaiwai
  • /
  • Artificial Intelligence
  • Machine Learning
  • Robotics
  • Engineering
    • Architecture
    • Design
    • Software
    • Hybrid Cloud
    • Data
  • Learning
  • About
  • Architecture
  • Artificial Intelligence
  • Data

The First AI Model That Translates 100 Languages Without Relying On English Data

  • October 20, 2020
  • liwaiwai.com
  • Facebook AI is introducing, M2M-100 the first multilingual machine translation (MMT) model that translates between any pair of 100 languages without relying on English data. It’s open sourced here.

  • When translating, say, Chinese to French, previous best multilingual models train on Chinese to English and English to French, because English training data is the most widely available. Our model directly trains on Chinese to French data to better preserve meaning. It outperforms English-centric systems by 10 points on the widely used BLEU metric for evaluating machine translations.

  • M2M-100 is trained on a total of 2,200 language directions — or 10x more than previous best, English-centric multilingual models. Deploying M2M-100 will improve the quality of translations for billions of people, especially those who speak low-resource languages.

  • This milestone is a culmination of years of Facebook AI’s foundational work in machine translation. Today, we’re sharing details on how we built a more diverse MMT training data set and model for 100 languages. We’re also releasing the model, training, and evaluation setup to help other researchers reproduce and further advance multilingual models.

Breaking language barriers through machine translation (MT) is one of the most important ways to bring people together, provide authoritative information on COVID, and keep them safe from harmful content. Today, we power an average of 20 billion translations every day on Facebook News Feed, thanks to our recent developments in low-resource machine translation and recent advances for evaluating translation quality.

Typical MT systems require building separate AI models for each language and each task, but this approach doesn’t scale effectively on Facebook, where people post content in more than 160 languages across billions of posts. Advanced multilingual systems can process multiple languages at once, but compromise on accuracy by relying on English data to bridge the gap between the source and target languages. We need one multilingual machine translation (MMT) model that can translate any language to better serve our community, nearly two-thirds of which use a language other than English.


Partner with liwaiwai.com
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

In a culmination of many years of MT research at Facebook, we’re excited to announce a major milestone: the first single massive MMT model that can directly translate 100×100 languages in any direction without relying on only English-centric data. Our single multilingual model performs as well as traditional bilingual models and achieved a 10 BLEU point improvement over English-centric multilingual models.

Using novel mining strategies to create translation data, we built the first truly “many-to-many” data set with 7.5 billion sentences for 100 languages. We used several scaling techniques to build a universal model with 15 billion parameters, which captures information from related languages and reflects a more diverse script of languages and morphology. We’re open-sourcing this work here.

Mining hundreds of millions of sentences for thousands of language directions

One of the biggest hurdles of building a many-to-many MMT model is curating large volumes of quality sentence pairs (also known as parallel sentences) for arbitrary translation directions not involving English. It’s a lot easier to find translations for Chinese to English and English to French, than, say, French to Chinese. What’s more, the volume of data required for training grows quadratically with the number of languages that we support. For instance, if we need 10M sentence pairs for each direction, then we need to mine 1B sentence pairs for 10 languages and 100B sentence pairs for 100 languages.

Read More  Will Robots Make Good Friends? Scientists Are Already Starting To Find Out

We took on this ambitious challenge of building the most diverse many-to-many MMT data set to date: 7.5 billion sentence pairs across 100 languages. This was possible by combining complementary data mining resources that have been years in the making, including ccAligned, ccMatrix, and LASER. As part of this effort, we created a new LASER 2.0 and improved fastText language identification, which improves the quality of mining and includes open sourced training and evaluation scripts. All of our data mining resources leverage publicly available data and are open sourced.

Facebook AI’s new many-to-many multilingual model is a culmination of several years of pioneering work in MT across breakthrough models, data mining resources, and optimization techniques. This timeline highlights a few key noteworthy achievements. Additionally, we created our massive training data set by mining ccNET, which builds on fastText, our pioneering work on processing word representations; our LASER library for CCMatrix, which embeds sentences in a multilingual embedding space; and CCAligned, our method for aligning documents based on URL matches. As part of this effort, we created LASER 2.0, which improves upon previous results.

Still, even with advanced underlying technologies like LASER 2.0, mining large-scale training data for arbitrary pairs of 100 different languages (or 4,450 possible language pairs) is highly computationally intensive. To make this type of scale of mining more manageable, we focused first on languages with the most translation requests. Consequently, we prioritized mining directions with the highest quality data and largest quantity of data. We avoided directions for which translation need is statistically rare, like Icelandic-Nepali or Sinhala-Javanese.

Next, we introduced a new bridge mining strategy, in which we group languages into 14 language groups based on linguistic classification, geography, and cultural similarities. We did this because people living in countries with languages of the same family tend to communicate more often and would benefit from high-quality translations. For instance, one group would include languages spoken in India, like Bengali, Hindi, Marathi, Nepali, Tamil, and Urdu. We systematically mined all possible language pairs within each group.

To connect the languages of different groups, we identified a small number of bridge languages, which are usually one to three major languages of each group. In the example above, Hindi, Bengali, and Tamil would be bridge languages for Indo-Aryan languages. We then mined parallel training data for all possible combinations of these bridge languages. Using this technique, our training data set ended up with 7.5 billion parallel sentences of data, corresponding to 2,200 directions. Since the mined data can be used to train two directions of a given language pair (e.g., en->fr and fr->en), our mining strategy helps us effectively sparsely mine to best cover all 100×100 (a total of 9,900) directions in one model.

Read More  Artificial Intelligence Could Help Data Centers Run Far More Efficiently

To supplement the parallel data for low-resource languages with low translation quality, we used the popular method of back-translation, which helped us win first place at the 2018 and 2019 WMT International Machine Translation competitions. If our goal is to train a Chinese-to-French translation model, for instance, we’d first train a model for French to Chinese and translate all of the monolingual French data to create synthetic, back-translated Chinese. We’ve found that this method is particularly effective at large scale, when translating hundreds of millions of monolingual sentences into parallel data sets. In our research setting, we used back-translation to supplement the training of directions we had already mined, adding the synthetic back-translated data to the mined parallel data. And we used back-translation to create data for previously unsupervised directions.

Overall, the combination of our bridge strategy and back-translated data improved performance on the 100 back-translated directions by 1.7 BLEU on average compared with training on mined data alone. With a more robust, efficient, high-quality training set, we were well equipped with a strong foundation for building and scaling our many-to-many model.

We also found impressive results on zero-shot settings, in which there’s no training data available for a pair of languages. For instance, if a model is trained on French-English and German-Swedish, we can zero-shot translate between French and Swedish. In settings where our many-to-many model must zero-shot the translation between non-English directions, it was substantially better than English-centric multilingual models.

 

Scaling our MMT model to 15 billion parameters with high speed and quality

One challenge in multilingual translation is that a singular model must capture information in many different languages and diverse scripts. To address this, we saw a clear benefit of scaling the capacity of our model and adding language-specific parameters. Scaling the model size is helpful particularly for high-resource language pairs because they have the most data to train the additional model capacity. Ultimately, we saw an average improvement of 1.2 BLEU averaged across all language directions when densely scaling the model size to 12 billion parameters, after which there were diminishing returns from densely scaling further. The combination of dense scaling and language-specific sparse parameters (3.2 billion) enabled us to create an even better model, with 15 billion parameters.

We compare our model with bilingual baselines and English-centric multilingual models. We start with a 1.2 billion parameter baseline with 24 encoder layers and 24 decoder layers and compare English-centric models with our M2M-100 model. Next, if we compare 12B parameters with 1.2 billion parameters, we gain 1.2 BLEU points of improvement.

To grow our model size, we increased the number of layers in our Transformer networks as well as the width of each layer. We found that large models converge quickly and train with high data efficiency. Notably, this many-to-many system is the first to leverage Fairscale, the new PyTorch library specifically designed to support pipeline and tensor parallelism. We built this general infrastructure to accommodate large-scale models that don’t fit on a single GPU through model parallelism into Fairscale. We built on top of the ZeRO optimizer, intra-layer model parallelism, and pipeline model parallelism to train large-scale models.

Read More  AI Gets A Boost Via LLNL, SambaNova Collaboration

But it’s not enough to simply scale the models to billions of parameters. In order to be able to productionize this model in the future, we need to scale models as efficiently as possible with high-speed training. For example, much existing work uses multimodel ensembling, where multiple models are trained and applied to the same source sentence to produce a translation. To reduce complexity and compute required to train multiple models, we explored multisource self-ensembling, which translates a source sentence in multiple languages to improve translation quality. Also, we built on our work with LayerDrop and Depth-Adaptive, to jointly train a model with a common trunk and different sets of language-specific parameters. This approach is great for many-to-many models because it offers a natural way to split parts of a model by language pairs or language families. By combining dense scaling of model capacity with language-specific parameters (3B in total), we provide the benefits of large models as well as the ability to learn specialized layers for different languages.

 

On the path toward one multilingual model for all

For years, AI researchers have been working toward building a single universal model that can understand all languages across different tasks. A single model that supports all languages, dialects, and modalities will help us better serve more people, keep translations up to date, and create new experiences for billions of people equally. This work brings us closer to this goal.

As part of this effort, we’ve seen incredibly fast-paced progress in pretrained language models, fine-tuning, and self-supervision techniques. This collective research can further advance how our system understands text for low-resource languages using unlabeled data. For instance, XLM-R is our powerful multilingual model that can learn from data in one language and then execute a task in 100 languages with state-of-the-art accuracy. mBART is one of the first methods for pretraining a complete model for BART tasks across many languages. And most recently, our new self-supervised approach, CRISS, uses unlabeled data from many different languages to mine parallel sentences across languages and train new, better multilingual models in an iterative way.

We’ll continue to improve our model by incorporating such cutting-edge research, exploring ways to deploy MT systems responsibly, and creating the more specialized computation architectures necessary to bring this to production.

Get it on GitHub:

https://github.com/pytorch/fairseq/tree/master/examples/m2m_100

Read the paper:

https://ai.facebook.com/research/publications/beyond-english-centric-multilingual-machine-translation

By Angela Fan, Research Assistant

Source https://ai.facebook.com/blog/introducing-many-to-many-multilingual-machine-translation/


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

liwaiwai.com

Related Topics
  • Facebook AI
  • fasText
  • M2M-100
  • mBART
  • Translation
You May Also Like
OpenAI
View Post
  • Artificial Intelligence
  • Platforms

How We Interact With Information: The New Era Of Search

  • September 28, 2023
View Post
  • Artificial Intelligence
  • Engineering
  • Machine Learning
  • Platforms

Bring AI To Looker With The Machine Learning Accelerator

  • September 28, 2023
View Post
  • Artificial Intelligence
  • Technology

Microsoft And Mercy Collaborate To Empower Clinicians To Transform Patient Care With Generative AI

  • September 27, 2023
View Post
  • Artificial Intelligence
  • Machine Learning

Canonical releases Charmed MLFlow

  • September 26, 2023
View Post
  • Artificial Intelligence
  • Technology

NASA’s Mars Rovers Could Inspire A More Ethical Future For AI

  • September 26, 2023
View Post
  • Artificial Intelligence
  • Platforms

Oracle CloudWorld 2023: 6 Key Takeaways From The Big Annual Event

  • September 25, 2023
View Post
  • Artificial Intelligence

3 Ways AI Can Help Communities Adapt To Climate Change In Africa

  • September 25, 2023
Robotic Hand | Lights
View Post
  • Artificial Intelligence
  • Technology

Nvidia H100 Tensor Core GPUs Come To Oracle Cloud

  • September 24, 2023
A Field Guide To A.I.
Navigate the complexities of Artificial Intelligence and unlock new perspectives in this must-have guide.
Now available in print and ebook.

charity-water



Stay Connected!
LATEST
  • OpenAI 1
    How We Interact With Information: The New Era Of Search
    • September 28, 2023
  • 2
    Bring AI To Looker With The Machine Learning Accelerator
    • September 28, 2023
  • 3
    3 Questions: A New PhD Program From The Center For Computational Science And Engineering
    • September 28, 2023
  • 4
    Microsoft And Mercy Collaborate To Empower Clinicians To Transform Patient Care With Generative AI
    • September 27, 2023
  • 5
    Canonical releases Charmed MLFlow
    • September 26, 2023
  • 6
    NASA’s Mars Rovers Could Inspire A More Ethical Future For AI
    • September 26, 2023
  • 7
    Oracle CloudWorld 2023: 6 Key Takeaways From The Big Annual Event
    • September 25, 2023
  • 8
    3 Ways AI Can Help Communities Adapt To Climate Change In Africa
    • September 25, 2023
  • Robotic Hand | Lights 9
    Nvidia H100 Tensor Core GPUs Come To Oracle Cloud
    • September 24, 2023
  • 10
    AI-Driven Tool Makes It Easy To Personalize 3D-Printable Models
    • September 22, 2023

about
About
Hello World!

We are liwaiwai.com. Created by programmers for programmers.

Our site aims to provide materials, guides, programming how-tos, and resources relating to artificial intelligence, machine learning and the likes.

We would like to hear from you.

If you have any questions, enquiries or would like to sponsor content, kindly reach out to us at:

[email protected]

Live long & prosper!
Most Popular
  • 1
    Huawei: Advancing a Flourishing AI Ecosystem Together
    • September 22, 2023
  • Coffee | Laptop | Notebook | Work 2
    First HP Work Relationship Index Shows Majority of People Worldwide Have an Unhealthy Relationship with Work
    • September 20, 2023
  • 3
    Huawei Connect 2023: Accelerating Intelligence For Shared Success
    • September 20, 2023
  • 4
    Applying Generative AI To Product Design With BigQuery DataFrames
    • September 21, 2023
  • 5
    Combining AI With A Trusted Data Approach On IBM Power To Fuel Business Outcomes
    • September 21, 2023
  • /
  • Artificial Intelligence
  • Explore
  • About
  • Contact Us

Input your search keywords and press Enter.