A central focus of Facebook’s AI efforts is deploying cutting-edge machine learning technology to protect people from harmful content. With billions of people using our platforms, we rely on AI to scale our content review work and automate decisions when possible. Our goal is to spot hate speech, misinformation, and other forms of policy-violating content quickly and accurately, for every form of content, and for every language and community around the world.
Fully achieving this will require many more technical advances in AI. But we have made real progress, and today our automated systems are used first to review more content across all types of violations. Our continued investment and improvement can be seen in the Community Standards Enforcement Report we published today.
AI now proactively detects 94.7 percent of hate speech we remove from Facebook, up from 80.5 percent a year ago and up from just 24 percent in 2017.
This has been driven by advances in our automated detection tools, such as deploying XLM, Facebook AI’s method of training language systems across multiple languages without relying on hand-labeled data sets. We’ve recently deployed new systems that have further improved our ability to detect hate speech. These include Reinforced Integrity Optimizer, a system that learns from real examples and metrics, and using our Linformer AI architecture, which enables us to use cutting-edge language-understanding models that were previously too big and unwieldy to work at scale.
Hate speech, of course, is not the only content we need to catch. We’re also sharing new details in this blog post on our new AI technology to deal with misinformation. These include improving SimSearchNet, the image-matching tool we detailed earlier this year, and building a new deepfake detection tool that can learn and adapt over time.
Taken together, all these innovations mean our AI systems have a deeper, broader understanding of content. They are more attuned to things people share on our platforms right now, so they can adapt quicker when a new meme or photo emerges and spreads.
These challenges are complex, nuanced, and rapidly evolving. It’s crucial not just to detect problems but also to avoid mistakes, since misclassifying a piece of content as misinformation or hate speech can hamper people’s ability to express themselves on our platform. Continued improvement will require not only better technology but also operational excellence and effective partnerships with outside experts.
While we are constantly improving our AI tools, they are far from perfect. We know there’s more work to be done, and we are building and testing new systems to help us do more to protect the people who use our platform.
End-to-end learning and efficient new models to catch hate speech
Detecting some hate speech is straightforward for both AI and humans. Slurs and hateful symbols represent obvious attacks on a person’s race or religion. But many instances of hate speech are more complex. Combining text and images makes it harder for AI to grasp whether the intended meaning is offensive even when a human might find it obvious. Hate speech can also be disguised with sarcasm or slang, or with seemingly innocuous images that are perceived differently in different cultures, regions, and languages.
To better handle these challenges, we’ve built and deployed a new reinforcement learning framework called Reinforced Integrity Optimizer (RIO). Rather than relying on a static (and thus limited) offline data set, RIO directly uses online, real-world data from our production systems to optimize the AI models that detect hate speech. RIO works on the entire machine learning development life cycle, from data sampling to A/B testing.
In traditional AI systems, engineers optimize these steps in two separate processes: offline optimization and online experimentation. Offline, engineers can use a variety of tools to help pick the right train/test mix and the right neural architecture, but they’re based only on offline metrics, like precision and recall. The model is then also optimized offline, using A/B testing to figure out the model’s impact on online user and business metrics.
RIO takes a completely different approach. It optimizes all the steps in this process together and uses actual production performance metrics to tweak the offline components. This level of end-to-end optimization allows for significantly better performance and faster iteration. Instead of doing the work offline and then applying it to production to see if it works as anticipated, the online system tackles the real challenge.
RIO enables us to focus our model training on violations of our hate speech policy that were previously missed by our production models.
Using the right training data from RIO, we can then use cutting-edge content-understanding models to build effective classifiers. In recent years, researchers have dramatically improved language models’ performance by using Transformers, a mechanism for teaching the AI which parts of the text to pay attention to. But the biggest, most advanced state-of-the art Transformer models can have billions of parameters. Their high computational complexity — typically expressed as O(N^2) — means they’re not efficient enough to deploy in production at scale and work in near real time to detect hate speech.
To increase the efficiency of Transformer models, Facebook AI researchers developed a new architecture, called Linformer. Linformer provides a vastly more efficient way to use massive, cutting-edge models to understand content. We now use RIO and Linformer in production to analyze Facebook and Instagram content in different regions around the world. We shared our work and Linformer code with the AI research community so that we can build on one another’s work and accelerate progress for everyone.
Self-supervised, holistic, and multimodal understanding
RIO and Linformer help us train our systems more effectively and use advanced content-understanding models. But instances of hate speech or incitement to violence can be more complex than a single piece of text. In some of the toughest examples, we see part of the hateful content conveyed via an image or video and another part conveyed via text. Taken in isolation, each of these pieces may be benign. But put together, they become offensive.
As shown in the examples above, these scenarios require a more holistic understanding of the content in question. To make more progress toward this goal, we created Whole Post Integrity Embeddings (WPIE), a pretrained universal representation of content for integrity problems. WPIE works by trying to understand content across modalities, violation types, and even time. Our latest version is trained on more violations, and more training data overall. The system improves performance across modalities, such as text and image, by using focal loss. This approach prevents easy-to-classify examples from overwhelming the detector during training, along with gradient blending, which computes an optimal blend of modalities based on their overfitting behavior.
We also use post-level, self-supervised learning to build a pretrained universal representation of content for integrity problems. And we’ve deployed XLM-R, a model that leverages our state-of-the-art RoBERTa AI architecture, to improve our hate speech classifiers in multiple languages across Facebook and Instagram. XLM-R is now also part of our system to scale our COVID-19 Community Hub feature internationally.
We have much more work to do. Detecting hate speech is not only a difficult challenge. It’s also a constantly evolving one. A new piece of hate speech might not resemble previous examples because it references a new trend or or some new news story. We’ve built tools like RIO and WPIE so they can scale to future challenges as well as present ones. With more flexible and adaptive AI, we’re confident we can continue to make real progress.
Mike Schroepfer Chief Technology Officer