Explore Meta AI’s Self-Supervised Learning Demo For Images

meta-ai

Today, we are releasing the first-ever external demo based on Meta AI’s self-supervised learning work. We focus on Vision Transformers pretrained with DINO, a method we released last year that has grown in popularity based on its capacity to understand the semantic layout of an image.

Our choice to focus the first demo on DINO is motivated by its ability to learn both general and powerful semantic features, including patch-level matching and retrieval. Using the demo, people will be able to experience these advancements firsthand, including finding similar images or pieces of similar images, such as matching the eyes of a puppy to find similar-looking dogs, regardless of their position, location, or lighting in an image.

This illustration shows an example of patch-level retrieval.

 

While this may sound like a trivial use case, the technology underpinning this demo is part of the important bigger-picture future we are building at Meta AI. Computer vision powered by self-supervised learning is an important part of helping Meta AI researchers deliver AI systems that are more robust and less domain-centric in nature.

DINO enables AI researchers to build highly efficient computer vision systems that perform extremely well at a variety of tasks and are far less dependent on labeled data sets. For this to work, large-scale self-supervised learning training for computer vision needs an algorithm that can learn from random, unlabeled images and videos, and a vast amount of data to capture every piece of a diverse, everyday life. Our new AI Research SuperCluster will allow us to explore the training of larger models on even larger data sets, pushing the boundaries of what self-supervised learning can achieve.

Read More  Introducing Claude Pro

Using self-supervised learning to advance computer vision

While we previously released the DINO code, this demo allows researchers and engineers to explore how the model understands images, to test its robustness, and to try it on their own images. And it allows others who are interested in new AI techniques to see how a single technique can create models that are generic enough to solve many tasks.

There are several experiences people can explore in the demo. Through image retrieval, a person could select a picture and discover similar images from a third-party data set of five million images. Patch-level retrieval lets people select an object or area from an image to discover similar images, such as the dog eyes we mentioned earlier. Finally, patch-matching can find similar areas between two given images, despite differences in the background, positioning of objects, and lighting.

When a person opens the demo and inputs an image or defines a patch of an image, DINO outputs features and descriptions that can be used to specify how similar it is to other images. These outputs are useful because they can be used to compute the distance between two images, in the same way we can compute distances between 3D points described by three numbers. (For example, an image of a cat is “far away” from the image of a car but close to the image of a dog, and even closer to the image of another cat.) It’s this distance property that powers the DINO demo and delivers results, whether retrieving the nearest image or using patch-matching to show the closest patch.

Read More  Facebook Wants to Make Smart Robots to Explore Every Nook And Cranny Of Your Home

DINO provides a training procedure to enable an untrained model to learn this property, without using any labeled data. It’s based on a simple intuition: Given an image, we apply several modifications and teach our model that the modified image should still be similar to the original image. These modifications include changing the brightness or contrast, cropping a smaller part of the image, or rotating the image. With each modification, the model can learn something. From rotating, it learns that a bunny in different poses will still represent the same thing, while the brightness modification will teach it that a bunny in the shadow is similar to a bunny in bright sunlight.

While this model wasn’t developed with metaverse applications in mind, there are potential future applications for doing visual queries that can be personalized and remain entirely on a person’s device, which can help keep data more private. For example, you take a photo of an object to teach DINO “these are my car keys.” Later, when looking for your keys, you can query “Where are my car keys?” This type of application requires being able to memorize objects and find them in images — and this is something the DINO model can do well.

Image duplication identification is another potential future use case. DINO-based models could help detect copies of a particular piece of harmful content, even when the image has been modified. We believe self-supervised learning advances will ultimately pave the way for a future where machine learning algorithms can be built on and stay on a person’s device, creating a more private and personalized future powered by AI assistants.

Read More  Exploring the Hypothetical- Hyperintelligent Space Travel as Thought Experiment

Exploring the DINO Demo

While we are only beginning to harness the potential of self-supervised learning, we believe it will be an important advancement as we help build the metaverse and new AR/VR experiences. Self-supervised learning helps us gain a deep understanding of real-world environments and how people experience them, which is too big and diverse to capture in labeled data sets. We’ll need AI that can learn from everything it sees and hears, and that’s only possible with self-supervised learning.

While DINO shows an advancement in self-supervised learning, and has many exciting potential future use cases, we want to make sure this demo is used as part of our open science responsible AI. It is against the demo’s terms of use to upload photos of humans, and we include a detector to block human faces.

We invite everyone to try our demo. While self-supervised learning is still in its infancy, we are excited about the potential it holds for the future as we continue to work on more private and personalized AI projects.

Explore the demo

 

 

By Daniel Haziza, Research Engineer | Marc Szafraniec, Research Engineer | Pratik Ringshia, Research Engineer | Piotr Bojanowski, Research Scientist Manager | Patrick Labatut, Research Engineering Manager | Armand Joulin, Research Director
Source Meta AI


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Read More

liwaiwai_Apple-WWDC24-Apple-Intelligence-hero-240610_c

Introducing Apple Intelligence, the personal intelligence system that puts powerful generative models at the core of iPhone, iPad, and Mac

10 June 2024PRESS RELEASE Introducing Apple Intelligence, the personal intelligence system that puts powerful gener
Read More
tvOS 18 introduces intelligent new features like InSight that level up cinematic experiences. Users can stream Palm Royale on the Apple TV app with a subscription.

Updates to the Home experience elevate entertainment and bring more convenience 

10 June 2024 PRESS RELEASE tvOS 18 introduces new cinematic experiences with InSight, Enhance Dialogue, and subtitles CU
Read More