Large language models — natural language processing (NLP) systems with more than 100 billion parameters — have transformed NLP and AI research over the last few years. Trained on a massive and varied volume of text, they show surprising new capabilities to generate creative text, solve basic math problems, answer reading comprehension questions, and more. While in some cases the public can interact with these models through paid APIs, full research access is still limited to only a few highly resourced labs. This restricted access has limited researchers’ ability to understand how and why these large language models work, hindering progress on efforts to improve their robustness and mitigate known issues such as bias and toxicity.
In line with Meta AI’s commitment to open science, we are sharing Open Pretrained Transformer (OPT-175B), a language model with 175 billion parameters trained on publicly available data sets, to allow for more community engagement in understanding this foundational new technology. For the first time for a language technology system of this size, the release includes both the pretrained models and the code needed to train and use them. To maintain integrity and prevent misuse, we are releasing our model under a noncommercial license to focus on research use cases. Access to the model will be granted to academic researchers; those affiliated with organizations in government, civil society, and academia; along with industry research laboratories around the world.
We believe the entire AI community — academic researchers, civil society, policymakers, and industry — must work together to develop clear guidelines around responsible AI in general and responsible large language models in particular, given their centrality in many downstream language applications. A much broader segment of the AI community needs access to these models in order to conduct reproducible research and collectively drive the field forward. With the release of OPT-175B and smaller-scale baselines, we hope to increase the diversity of voices defining the ethical considerations of such technologies.
Responsible publication with OPT-175B
Following the publication guidelines for researchers generated by the Partnership on AI, along with the governance guidance outlined by NIST in March 2022 (section 3.4), we are releasing all our notes documenting the development process, including the full logbook detailing the day-to-day training process, so other researchers can more easily build on our work. Furthermore, these details disclose how much compute was used to train OPT-175B and the human overhead required when underlying infrastructure or the training process itself becomes unstable at scale.
We are sharing OPT-175B, along with the codebase used to train and deploy the model using only 16 NVIDIA V100 GPUs, in order to increase the accessibility of these models specifically for research purposes and to provide a foundation for analyzing potential harms rooted in quantifiable metrics on a common, shared model. We are also fully releasing a suite of smaller-scale baseline models, trained on the same data set and using similar settings as OPT-175B, to enable researchers to study the effect of scale alone. The parameter count for these smaller-scale models includes 125 million, 350 million, 1.3 billion, 2.7 billion, 6.7 billion, 13 billion, and 30 billion (66 billion to be released soon).
Recent developments in AI research have consumed an extraordinary amount of compute power. While industry labs have started to report the carbon footprint of these models, most do not include the computational cost associated with the R&D phases of experimentation, which in some cases can be an order of magnitude more resource-intensive than training the final model.
We developed OPT-175B with energy efficiency in mind by successfully training a model of this size using only 1/7th the carbon footprint as that of GPT-3. This was achieved by combining Meta’s open source Fully Sharded Data Parallel (FSDP) API and NVIDIA’s tensor parallel abstraction within Megatron-LM. We achieved ~147 TFLOP/s/GPU utilization on NVIDIA’s 80 GB A100 GPUs, roughly 17 percent higher than published by NVIDIA researchers on similar hardware.
By sharing these baselines along with the codebase to train a 175B model efficiently, we have an opportunity to reduce our collective environmental footprint while also allowing new results and progress in the field to be measurable in a consistent manner.
Propelling research forward through open collaboration
For AI research to advance, the broader scientific community must be able to work together with cutting-edge models to effectively explore their potential while also probing for their vulnerabilities at the same time. As with our previous open-science initiatives, such as the Image Similarity Challenge, the Deepfake Detection Challenge, and the Hateful Memes Challenge, Meta AI believes that collaboration across research organizations is critical to the responsible development of AI technologies.
While there are many exciting developments in the space of large language models, the limitations and risks these models pose are still not well understood. Without direct access to these models, researchers are also limited in their ability to design detection and mitigation strategies for possible harm, which leaves detection and mitigation in the hands of only those with sufficient capital to access models of this scale. We hope that OPT-175B will bring more voices to the frontier of large language model creation, help the community collectively design responsible release strategies, and add an unprecedented level of transparency and openness to the development of large language models in the field.
Pretrained models are all licensed under the OPT-175B License Agreement.
This work on large-scale pretraining is being undertaken by a multidisciplinary team that includes Stephen Roller, Naman Goyal, Anjali Sridhar, Punit Singh Koura, Moya Chen, Kurt Shuster, Mikel Artetxe, Daniel Simig, and Tianlu Wang. Advisers for responsible AI conduct also include Adina Williams, Eric Smith, Emily Dinan, Y-Lan Boureau, Melanie Kambadur, and Joelle Pineau.
By Susan Zhang, Research Engineer | Mona Diab, Research Scientist | Luke Zettlemoyer, Research Director
Source Meta AI