Liwaiwai Liwaiwai
  • /
  • Artificial Intelligence
  • Machine Learning
  • Robotics
  • Engineering
    • Architecture
    • Design
    • Software
    • Hybrid Cloud
    • Data
  • About
  • /
  • Artificial Intelligence
  • Machine Learning
  • Robotics
  • Engineering
    • Architecture
    • Design
    • Software
    • Hybrid Cloud
    • Data
  • About
Liwaiwai Liwaiwai
  • /
  • Artificial Intelligence
  • Machine Learning
  • Robotics
  • Engineering
    • Architecture
    • Design
    • Software
    • Hybrid Cloud
    • Data
  • About
  • Artificial Intelligence
  • Machine Learning

The First High-Performance Self-Supervised Algorithm That Works For Speech, Vision, And Text

  • January 25, 2022
  • liwaiwai.com

Self-supervised learning — where machines learn by directly observing the environment rather than being explicitly taught through labeled images, text, audio, and other data sources — has powered many significant recent advances in AI. But while people appear to learn in a similar way regardless of how they get information — whether they use sight or sound, for example — there are currently big differences in the way self-supervised learning algorithms learn from images, speech, text, and other modalities.

This discrepancy has been a significant barrier to applying advances in self-supervised learning more broadly. Because a powerful algorithm designed for, say, understanding images can’t be directly applied to another modality, such as text, it is difficult to push several modalities ahead at the same rate.


Partner with liwaiwai.com
for your next big idea.
Let us know here.


cyberpogo

This is why Meta AI developed and is excited to announce data2vec, the first high-performance self-supervised algorithm that works for multiple modalities. We apply data2vec separately to speech, images and text and it outperformed the previous best single-purpose algorithms for computer vision and speech and it is competitive on NLP tasks. It also represents a new paradigm of holistic self-supervised learning, where new research improves multiple modalities rather than just one. It also does not rely on contrastive learning or reconstructing the input example. In addition to helping accelerate progress in AI, data2vec brings us closer to building machines that learn seamlessly about different aspects of the world around them. It will enable us to develop more adaptable AI, which we believe will be able to perform tasks beyond what today’s systems can do.

As part of this announcement, we are sharing code and pretrained models on data2vec so that others in the research community can build upon our work.

How data2vec works

Much of AI is still based on supervised learning, which works exclusively with labeled data. But it’s simply not possible to collect labeled data for all the things we would like machines to do. For example, while researchers have done a lot of work in creating large-scale labeled data sets for English speech and text, it is not feasible to do this for the literally thousands of languages spoken on the planet.

Read More  IBM RXN: New AI Model Boosts Mapping Of Chemical Reactions

Self-supervision enables computers to learn about the world just by observing it and then figuring out the structure of images, speech, or text. Having machines that don’t need to be explicitly taught to classify images or understand spoken language is simply much more scalable.

Research in self-supervised learning today is almost always focused on one particular modality. So, researchers working on one modality often take a very different approach from those working on another. For text, researchers train models to fill in blanks in sentences. Speech models, however, need to learn an inventory of the basic sounds of speech in order to predict missing sounds. In computer vision, models are often trained to assign similar representations to a color image of a cow and the same image flipped upside down, so it associates the two much more closely than it would with an unrelated image, such as that of a duck.

Algorithms also predict different units for each modality: pixels or visual tokens for images, words for text, and learned inventories of sounds for speech. A collection of pixels is very different from an audio waveform or a passage of text, and because of this, algorithm design has been tied to a specific modality. This means that algorithms are still functioning differently in each modality.

Data2vec simplifies this by training models to predict their own representations of the input data, regardless of the modality. By focusing on these representations — the layers of a neural network — instead of predicting visual tokens, words, or sounds, a single algorithm can work with completely different types of input. This removes the dependence on modality-specific targets in the learning task. Directly predicting representations is not straightforward, and it required defining a robust normalization of the features for the task that would be reliable in different modalities.

Read More  Artificial Intelligence: Computer Says YES (But Is It Right?)

Our method uses a teacher network to first compute target representations from an image, a piece of text, or a speech utterance. Next, we mask part of the input and repeat the process with a student network, which then predicts the latent representations of the teacher. The student model has to predict representations of the full input data even though it has a view of only some of the information. The teacher network is identical to the student model but with weights that are slightly out of date.

We tested the method on the popular ImageNet computer vision benchmark, where it performed better than existing methods for popular model sizes. On speech, we found that it performed better than wav2vec 2.0 or HuBERT, two previous Meta AI self-supervised algorithm for speech. For text, we tested it on the popular GLUE benchmark suite, and it performed as well as RoBERTa, a reimplementation of BERT.

Data2vec for computer vision: performance on the popular ImageNet benchmark for ViT-B models compared with other recent methods.

Data2vec for speech: performance for Base models on the LibriSpeech benchmark with 10h labeled data compared with other recent methods. Lower error rate indicates better performance.

Data2vec for text: performance on the GLUE natural language understanding benchmark for Base models compared with RoBERTa when retrained with the original BERT settings. Higher score indicates better performance.

Toward machines that learn from observing the world around them

While self-supervised learning has made great progress in computer vision, videos, and other individual modalities through different learning objectives, the core idea of this approach is to learn more generally: AI should be able to learn to do many different tasks, including those that are entirely unfamiliar. We want a machine to not only recognize animals shown in its training data but also adapt to recognize new creatures if we tell it what they look like. Data2vec demonstrates that the same self-supervised algorithm can work well in different modalities — and often better than the best existing algorithms. This paves the way for more general self-supervised learning and brings us closer to a world where AI might use videos, articles, and audio recordings to learn about complicated subjects, such as the game of soccer or different ways to bake bread. We also hope data2vec will bring us closer to a world where computers need very little labeled data in order to accomplish tasks. Since it is difficult and sometimes impossible to collect annotated examples — to train speech recognition models for thousands of languages, for example — data2vec is an important step toward more general AI. This project complements research on general model architectures, and we hope that in the future we can remove the need for modality-specific feature extractors by combining these two lines of work.

Read More  Faster, More Flexible Inference On GPUs Using AITemplate, A Revolutionary New Inference Engine

Access the open source code and release pretrained models here and read the paper here.

This blog post was made possible by the work of Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli.
Source Facebook AI Blog


Our humans need coffee too! Your support is highly appreciated, thank you!

liwaiwai.com

Related Topics
  • BERT
  • Data2vec
  • HuBERT
  • RoBERTa
  • wav2vec 2.0
You May Also Like
View Post
  • Artificial Intelligence
  • Technology

Helping Robots Handle Fluids

  • June 1, 2023
View Post
  • Artificial Intelligence

Introducing 100K Context Windows

  • May 30, 2023
View Post
  • Architecture
  • Artificial Intelligence
  • Design

Sandvik unveils the Impossible Statue – an AI-enabled collaboration between Michelangelo, Rodin, Kollwitz, Kotaro, Savage and Sandvik

  • May 30, 2023
View Post
  • Data
  • Machine Learning

Effective Management Of Data Sources In Machine Learning

  • May 29, 2023
View Post
  • Artificial Intelligence

How Auditoria.AI Is Building AI-Powered Smart Assistants For Finance Teams

  • May 29, 2023
View Post
  • Artificial Intelligence
  • Technology

AI Coming To The PC At Scale

  • May 27, 2023
View Post
  • Artificial Intelligence
  • Platforms

Build Next-Generation, AI-Powered Applications On Microsoft Azure

  • May 26, 2023
View Post
  • Artificial Intelligence
  • Technology

Combining Generative AI With IBM Watson, Mitsui Chemicals Starts Verifying New Application Discovery For Agility And Accuracy

  • May 25, 2023
Stay Connected!
LATEST
  • 1
    Helping Robots Handle Fluids
    • June 1, 2023
  • 2
    Introducing 100K Context Windows
    • May 30, 2023
  • 3
    Sandvik unveils the Impossible Statue – an AI-enabled collaboration between Michelangelo, Rodin, Kollwitz, Kotaro, Savage and Sandvik
    • May 30, 2023
  • 4
    Effective Management Of Data Sources In Machine Learning
    • May 29, 2023
  • 5
    How Auditoria.AI Is Building AI-Powered Smart Assistants For Finance Teams
    • May 29, 2023
  • 6
    G7 2023: The Real Threat To The World Order Is Hypocrisy.
    • May 28, 2023
  • 7
    AI Coming To The PC At Scale
    • May 27, 2023
  • 8
    Build Next-Generation, AI-Powered Applications On Microsoft Azure
    • May 26, 2023
  • 9
    Combining Generative AI With IBM Watson, Mitsui Chemicals Starts Verifying New Application Discovery For Agility And Accuracy
    • May 25, 2023
  • 10
    Faster Together: How Dun & Bradstreet Datasets Accelerate Your Real-Time Insights
    • May 24, 2023

about
About
Hello World!

We are liwaiwai.com. Created by programmers for programmers.

Our site aims to provide materials, guides, programming how-tos, and resources relating to artificial intelligence, machine learning and the likes.

We would like to hear from you.

If you have any questions, enquiries or would like to sponsor content, kindly reach out to us at:

[email protected]

Live long & prosper!
Most Popular
  • 1
    Wipro Expands Google Cloud Partnership To Advance Enterprise Adoption Of Generative AI
    • May 23, 2023
  • 2
    Google Cloud Launches AI-Powered Solutions To Safely Accelerate Drug Discovery And Precision Medicine
    • May 16, 2023
  • 3
    Huawei And Partners Announce Yucatan Wildlife Conservation Findings
    • May 18, 2023
  • 4
    Cloudflare’s R2 Is The Infrastructure Powering Leading AI Companies
    • May 16, 2023
  • 5
    TCS Announces Generative AI Partnership With Google Cloud And New Offering For Enterprise Customers
    • May 22, 2023
  • /
  • Artificial Intelligence
  • Explore
  • About
  • Contact Us

Input your search keywords and press Enter.