How AI, And Specifically BERT, Helps The Patent Industry

November 23, 2020

2 min read

In recent years the patent industry has begun to use machine-learning (ML) algorithms to add efficiency and insights to business practices.

Any company, patent office, or academic institution that works with patents—generating them through innovation, processing applications about them, or developing sophisticated ways to analyze them—will benefit from doing patent analytics and machine learning in Google Cloud.

We are excited to release a white paper that outlines a methodology to train a BERT (bidirectional encoder representation from transformers) model on over 100 million patent publications from the U.S. and other countries using open-source tooling. The paper describes how to use the trained model for a number of use cases, including how to more effectively perform prior art searching to determine the novelty of a patent application, automatically generate classification codes to assist with patent categorization, and autocomplete. The white paper is accompanied by a colab notebook as well the trained model hosted in GitHub.

Google’s release of the BERT model (paper, blog post, and open-source code) in 2018 was an important breakthrough that leveraged transformers to outperform other leading state of the art models across major NLP benchmarks, including GLUE, MultiNLI, and SQuAD. Shortly after its release, the BERT framework and many additional transformer-based extensions gained widespread industry adoption across domains like search, chatbots, and translation.

We believe that the patents domain is ripe for the application of algorithms like BERT due to the technical characteristics of patents as well as their business value. Technically, the patent corpus is large (millions of new patents are issued every year world-wide), complex (patent applications generally average ~10,000 words and are often meticulously wordsmithed by inventors, lawyers, and patent examiners), unique (patents are written in a highly specialized ‘legalese’ that can be unintelligible to a lay reader), and highly context dependent (many terms are used to mean completely different things in different patents).

Patents also represent tremendous business value to a number of organizations, with corporations spending tens of billions of dollars a year developing patentable technology and transacting the rights to use the resulting technology and patent offices around the world spending additional billions of dollars a year reviewing patent applications.

We hope that our new white paper and its associated code and model will help the broader patent community in its application of ML, including:

Corporate patent departments looking to improve their internal models and tooling with more advanced ML techniques.
Patent offices interested in leveraging state-of-the-art ML approaches to assist with patent examination and prior art searching.
ML and NLP researchers and academics who might not have considered using the patents corpus to test and develop novel NLP algorithms.
Patent researchers and academics who might not have considered applying the BERT algorithm or other transformer based approaches to their study of patents and innovation.

To learn more, you can download the full white paper, colab notebook, and trained model. Additionally, see Google Patents Public Datasets: Connecting Public, Paid, and Private Patent Data, Expanding your patent set with ML and BigQuery, and Measuring patent claim breadth using Google Patents Public Datasets for more tutorials to help you get started with patent analytics in Google Cloud.

By Rob Srebrovic Data Scientist, Global Patents& Jay Yonamine Head of Data Science, Global Patents at Google

Source: Google Cloud Blog

For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

liwaiwai

Google Cloud AI Digitizes StoryCorps Archive: Largest Collection Of Human Voices On Planet

“A Field Guide To AI: For Business, Institutions, Society & Political Economy” — Your Essential Companion In Navigating the World of Artificial Intelligence.

June 29, 2023

This book is not just another textbook, academic journal, or report. It’s a unique blend—a field guide designed…

2 min read

Teaching AI To Perceive The World Through Your Eyes

October 20, 2021

AI that understands the world from a first-person point of view could unlock a new era of immersive experiences,…

9 min read

Robotics Trends: Artificial Intelligence Leads Twitter Mentions In Q3 2020

October 29, 2020

Verdict lists the top five terms tweeted on robotics in Q3 2020, based on data from GlobalData’s Influencer…

4 min read

Introducing The AI Research SuperCluster — Meta’s Cutting-Edge AI Supercomputer For AI research

January 29, 2022

Developing the next generation of advanced AI will require powerful new computers capable of quintillions of…

7 min read

How AI, And Specifically BERT, Helps The Patent Industry

For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Google Cloud AI Digitizes StoryCorps Archive: Largest Collection Of Human Voices On Planet

EPFL Unveils A Raptor-Inspired Drone With Morphing Wing And Tail

OpenAI Leadership Responds to Meta Offers: ‘Someone Has Broken Into Our Home’

Microsoft Says Its New AI System Diagnosed Patients 4 Times More Accurately Than Human Doctors

Accelerating scientific discovery with AI

OpenAI Loses 4 Key Researchers to Meta

I Let AI Agents Plan My Vacation—and It Wasn’t Terrible

Using generative AI to help robots jump higher and land safely

OpenAI’s Unreleased AGI Paper Could Complicate Microsoft Negotiations

The AI Backlash Keeps Growing Stronger

AlphaGenome: AI for better understanding the genome

MIT and Mass General Brigham launch joint seed program to accelerate innovations in health

The Summer Adventures : Camping Essentials

Meta Wins Blockbuster AI Copyright Case—but There’s a Catch

How AI, And Specifically BERT, Helps The Patent Industry

For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Share this article

Google Cloud AI Digitizes StoryCorps Archive: Largest Collection Of Human Voices On Planet

EPFL Unveils A Raptor-Inspired Drone With Morphing Wing And Tail

Read next