Self-driving cars, text to speech, artificial intelligence (AI) services and delivery drones — just a few obvious applications of AI. To keep fueling the AI gold rush, we’ve been improving the very heart of AI hardware technology: digital AI cores that power deep learning, the key enabler of artificial intelligence.
At IBM Research, we’ve been making strides in adapting to workload complexities of AI systems while streamlining and accelerating performance – by innovating across materials, devices, chip architectures and the entire software stack, bringing closer the next generation AI computational systems with cutting-edge performance and unparalleled energy efficiency.
In a new paper presented at the 2021 International Solid-State Circuits Virtual Conference (ISSCC), our team details the world’s first energy efficient AI chip at the vanguard of low precision training and inference built with 7nm technology. Through its novel design, the AI hardware accelerator chip supports a variety of model types while achieving leading edge power efficiency on all of them.
This chip technology can be scaled and used for many commercial applications — from large-scale model training in the cloud to security and privacy efforts by bringing training closer to the edge and data closer to the source. Such energy efficient AI hardware accelerators could significantly increase compute horsepower, including in hybrid cloud environments, without requiring huge amounts of energy.
AI model sophistication and adoption is quickly expanding, now being used for drug discovery, modernizing legacy IT applications and writing code for new applications. But the rapid evolution of AI model complexity also increases the technology’s energy consumption, and a big issue has been to create sophisticated AI models without growing carbon footprint. Historically, the field has simply accepted that if the computational need is big, so too will be the power needed to fuel it.
But we want to change this approach and develop an entire new class of energy-efficient AI hardware accelerators that will significantly increase compute power without requiring exorbitant energy.
Tackling the problem
Since 2015, we’ve been consistently improving power performance of AI chips, boosting improving power performance by 2.5 times every year. To do so, we’ve been creating algorithmic techniques that enable training and inference without loss of prediction accuracy. We’ve also been developing architectural innovations and chip designs that allow us to build highly efficient compute engines able to execute more complex workloads with high-sustained use and power efficiency. And we’ve been creating a software stack that renders the hardware transparent to the application developer and compatible across hybrid cloud infrastructure, from cloud to edge.
We remain the leaders in driving reduced precision for AI models [Figure 1], with industry-wide adoption. We’ve extended reduced precision formats to 8-bit for training and 4-bits for inference and developed data communication protocols that enable AI cores on a multiple-core chip to exchange data effectively with each other. Most recently, our team demonstrated 4-bit formats for training at NeurIPS 2020: IBM breakthroughs could help bring AI training from cloud to edge.
Our new ISSCC paper reflects the latest stage in these advancements, focused on the creation of a chip that is highly optimized for low-precision training and inference for all of the different AI model types — without any loss of quality at the application level.
We showcase several novel characteristics of the chip. To start with, it’s the first silicon chip ever to incorporate ultra-low precision hybrid FP8 (HFP8) formats for training deep-learning models in a state-of-the-art silicon technology node (7 nm EUV-based chip). Also, the raw power efficiency numbers are state of the art across all different precisions. Figure 2 shows that our chip performance and power efficiency exceed other that of dedicated inference and training chips.
But this is not all. It’s one of the first chips to incorporate power management in AI hardware accelerators. In this research, we show that we can maximize the performance of the chip within its total power budget, by slowing it down during computation phases with high power consumption.
Finally, we demonstrate that our chip, in addition to great peak performance, has high sustained utilization that translates to real application performance and is a key part of engineering our chip for energy efficiency. Our chips routinely achieve more than 80 percent utilization for training and more than 60 percent utilization for inference — as compared to typical GPU utilizations that are typically well below 30 percent utilization.
Our new AI core and chip can be used for many new cloud to edge applications across multiple industries. For instance, they can be used for cloud training of large-scale deep learning models in vision, speech and natural language processing using 8-bit formats (vs. the 16- and 32-bit formats currently used in the industry). They can also be used for cloud inference applications, such as for speech to text AI services, text to speech AI services, NLP services, financial transaction fraud detection and broader deployment of AI models in financial services.
Autonomous vehicles, security cameras and mobile phones can benefit from it too, and it can be handy for federated learning at the edge for customization, privacy, security and compliance.
We hope that through this work, we can establish an entirely new way of creating and deploying AI models that scale performance and cut power consumption. Please check out the IBM Research AI Hardware Center for more information about our research and our team.