Liwaiwai Liwaiwai
  • /
  • Artificial Intelligence
  • Machine Learning
  • Robotics
  • Engineering
    • Architecture
    • Design
    • Software
    • Hybrid Cloud
    • Data
  • About
Liwaiwai Liwaiwai
  • /
  • Artificial Intelligence
  • Machine Learning
  • Robotics
  • Engineering
    • Architecture
    • Design
    • Software
    • Hybrid Cloud
    • Data
  • About
  • Artificial Intelligence

Powered By AI: Turning Any 2D Photo Into 3D Using Convolutional Neural Nets

  • March 6, 2020
  • relay

3D Photos feature on Facebook launched in 2018 as a new, immersive format for sharing pictures with friends and family. The feature has relied on the dual-lens “portrait mode” capabilities available only in new, higher-end smartphones, however. So it hasn’t been available on typical mobile devices, which have only a single, rear-facing camera. To bring this new visual format to more people, we have used state-of-the-art machine learning techniques to produce 3D photos from virtually any standard 2D picture. This system infers the 3D structure of any image, whether it is a new shot just taken on an Android or iOS device with a standard single camera, or a decades-old image recently uploaded to a phone or laptop.

This advance makes 3D photo technology easily accessible for the first time to the many millions of people who use single-lens camera phones or tablets. It also allows everyone to experience decades-old family photos and other treasured images in a new way, by converting them to 3D. People with state-of-the-art dual-camera devices can benefit, too, since they can now use their single, front-facing camera to take 3D selfies. Anyone with an iPhone 7 or higher, or a recent midrange or better Android device, can now try these options in the Facebook app.

Image result for facebook 2d to 3d neural net

Building this enhanced 3D Photos technique required overcoming a variety of technical challenges, such as training a model that correctly infers 3D positions of an extremely wide variety of subject matter and optimizing the system so that it works on-device on typical mobile processors in a fraction of a second. To overcome these challenges, we trained a convolutional neural network (CNN) on millions of pairs of public 3D images and their accompanying depth maps, and leveraged a variety of mobile-optimization techniques previously developed by Facebook AI, such as FBNet and ChamNet.

Delivering highly efficient performance on mobile devices

Given a standard RGB image, the 3D Photos CNN can estimate a distance from the camera for each pixel. We accomplished this through four means:

  • A network architecture built with a set of parameterizable, mobile-optimized neural building blocks.

  • Automated architecture search to find an effective configuration of these blocks, enabling the system to perform the task in under a second on a wide range of devices.

  • Quantization-aware training to leverage high-performance INT8 quantization on mobile while minimizing potential quality degradation from the quantization process.

  • Large amounts of training data derived from public 3D photos.

Read More  Microsoft‘s Big AI Ambitions Go Beyond Just OpenAI And ChatGPT

Neural building blocks

Our architecture uses building blocks inspired by FBNet, a framework to optimize ConvNet architectures for mobile and other resource-constrained devices. A building block consists of point-wise convolution, optional upsampling, K x K depthwise convolution, and an additional point-wise convolution. We implement a U-net style architecture that has been modified to place FBNet building blocks along the skip connection. The U-net encoder and decoder each contain five stages, each corresponding to a different spatial resolution.

Automated architecture search

In order to find an effective architecture configuration, we automated the search process using ChamNet, an algorithm developed by Facebook AI. The ChamNet algorithm iteratively samples points from the search space to train an accuracy predictor. This accuracy predictor is used to accelerate a genetic search to find a model that maximizes predicted accuracy while satisfying specified resource constraints. In this setting, we used a search space that varies the channel expansion factor and number of output channels per block, resulting in 3.4×1022 possible architectures. We then completed the search in approximately three days using 800 Tesla V100 GPUs, setting and then adjusting a FLOP constraint on the model architecture in order to achieve different operating points.

Quantization-aware training

By default, our model is trained using single-precision floating point weights and activations, but we found significant advantages to quantizing both weights and activations to be only 8 bits. In particular, int8 weights require only a quarter of the storage required of float32 weights, thereby reducing the number of bytes that must be transferred to the device on first use.

Int8-based operators also have much higher throughput compared with their float32 counterparts, thanks to well-tuned libraries such as Facebook AI’s QNNPACK, which has been integrated into PyTorch. We used quantization-aware training (QAT) to avoid an unacceptable drop in quality due to quantization. QAT, which is now available as part of PyTorch, simulates quantization during training and supports back propagation, thereby eliminating the gap between training and production performance.

Finding new ways to create 3D experiences

In addition to refining and improving our depth estimation algorithm, we’re working toward enabling high-quality depth estimation for videos taken with mobile devices. Videos pose a noteworthy challenge, since each frame depth must be consistent with the next. But it is also an opportunity to improve performance, since multiple observations of the same objects can provide additional signal for highly accurate depth estimations. Video-length depth estimation will open up a variety of innovative content creation tools to our users. As we continue to improve the performance of our neural network, we will also explore leveraging depth estimation, surface normal estimation, and spatial reasoning in real-time applications such as augmented reality.

Read More  Harmful Content Can Evolve Quickly. Facebook's New AI System Adapts To Tackle It.

Beyond these potential new experiences, this work will help us better understand the content of 2D images more generally. Improved understanding of 3D scenes could also help robots navigate and interact with the physical world. We hope that by sharing details about our 3D Photos system, we will help the AI community make progress in these areas and create new experiences that leverage advanced 3D understanding.

 

Kevin Matzen, Matthew Yu, Jonathan Lehman, Peizhao Zhang, Jan-Michael Frahm, Peter Vajda, Johannes Kopf, Matt Uyttendaele

Source: Facebook AI

relay

Related Topics
  • ChamNet
  • Facebook AI
  • FBNet
You May Also Like
View Post
  • Artificial Intelligence
  • Technology

Unlocking The Secrets Of ChatGPT: Tips And Tricks For Optimizing Your AI Prompts

  • March 29, 2023
View Post
  • Artificial Intelligence
  • Technology

Try Bard And Share Your Feedback

  • March 29, 2023
View Post
  • Artificial Intelligence
  • Data
  • Data Science
  • Machine Learning
  • Technology

Google Data Cloud & AI Summit : In Less Than 12 Hours From Now

  • March 29, 2023
View Post
  • Artificial Intelligence
  • Technology

Talking Cars: The Role Of Conversational AI In Shaping The Future Of Automobiles

  • March 28, 2023
View Post
  • Artificial Intelligence
  • Tools

Document AI Introduces Powerful New Custom Document Classifier To Automate Document Processing

  • March 28, 2023
View Post
  • Artificial Intelligence
  • Design
  • Practices

How AI Can Improve Digital Security

  • March 27, 2023
View Post
  • Artificial Intelligence
  • Machine Learning
  • Technology

ChatGPT 4.0 Finally Gets A Joke

  • March 27, 2023
View Post
  • Artificial Intelligence
  • Machine Learning
  • Technology

Mr. Cooper Is Improving The Home-buyer Experience With AI And ML

  • March 24, 2023

Leave a Reply

Your email address will not be published. Required fields are marked *

Stay Connected!
LATEST
  • 1
    Unlocking The Secrets Of ChatGPT: Tips And Tricks For Optimizing Your AI Prompts
    • March 29, 2023
  • 2
    Try Bard And Share Your Feedback
    • March 29, 2023
  • 3
    Google Data Cloud & AI Summit : In Less Than 12 Hours From Now
    • March 29, 2023
  • 4
    Talking Cars: The Role Of Conversational AI In Shaping The Future Of Automobiles
    • March 28, 2023
  • 5
    Document AI Introduces Powerful New Custom Document Classifier To Automate Document Processing
    • March 28, 2023
  • 6
    How AI Can Improve Digital Security
    • March 27, 2023
  • 7
    ChatGPT 4.0 Finally Gets A Joke
    • March 27, 2023
  • 8
    Mr. Cooper Is Improving The Home-buyer Experience With AI And ML
    • March 24, 2023
  • 9
    My First Pull Request At Age 14
    • March 24, 2023
  • 10
    The 5 Podcasts To Check If You Want To Get Up To Speed On AI
    • March 24, 2023

about
About
Hello World!

We are liwaiwai.com. Created by programmers for programmers.

Our site aims to provide materials, guides, programming how-tos, and resources relating to artificial intelligence, machine learning and the likes.

We would like to hear from you.

If you have any questions, enquiries or would like to sponsor content, kindly reach out to us at:

[email protected]

Live long & prosper!
Most Popular
  • 1
    GPT-4 : The Latest Milestone From OpenAI
    • March 24, 2023
  • 2
    Ditching Google: The 3 Search Engines That Use AI To Give Results That Are Meaningful
    • March 23, 2023
  • 3
    Peacock: Tackling ML Challenges By Accelerating Skills
    • March 23, 2023
  • 4
    Coop Reduces Food Waste By Forecasting With Google’s AI And Data Cloud
    • March 23, 2023
  • 5
    Gods In The Machine? The Rise Of Artificial Intelligence May Result In New Religions
    • March 23, 2023
  • /
  • Artificial Intelligence
  • Machine Learning
  • Robotics
  • Engineering
  • About

Input your search keywords and press Enter.