One of the most salient features of our culture is that there is so much bullshit.” These are the opening words of the short book On Bullshit, written by the philosopher Harry Frankfurt. Fifteen years after the publication of this surprise bestseller, the rapid progress of research on artificial intelligence is forcing us to reconsider our conception of bullshit as a hallmark of human speech, with troubling implications. What do philosophical reflections on bullshit have to do with algorithms? As it turns out, quite a lot.
In May this year the company OpenAI, co-founded by Elon Musk in 2015, introduced a new language model called GPT-3 (for “Generative Pre-trained Transformer 3”). It took the tech world by storm. On the surface, GPT-3 is like a supercharged version of the autocomplete feature on your smartphone; it can generate coherent text based on an initial input. But GPT-3’s text-generating abilities go far beyond anything your phone is capable of. It can disambiguate pronouns, translate, infer, analogize, and even perform some forms of common-sense reasoning and arithmetic. It can generate fake news articles that humans can barely detect above chance. Given a definition, it can use a made-up word in a sentence. It can rewrite a paragraph in the style of a famous author. Yes, it can write creative fiction. Or generate code for a program based on a description of its function. It can even answer queries about general knowledge. The list goes on.
GPT-3 is a marvel of engineering due to its breathtaking scale. It contains 175 billion parameters (the weights in the connections between the “neurons” or units of the network) distributed over 96 layers. It produces embeddings in a vector space with 12,288 dimensions. And it was trained on hundreds of billions of words representing a significant subset of the Internet—including the entirety of English Wikipedia, countless books, and a dizzying number of web pages. Training the final model alone is estimated to have cost around $5 million. By all accounts, GPT-3 is a behemoth. Scaling up the size of its network and training data, without fundamental improvements to the years-old architecture, was sufficient to bootstrap the model into unexpectedly remarkable performance on a range of complex tasks, out of the box. Indeed GPT-3 is capable of “few-shot,” and even, in some cases, “zero-shot,” learning, or learning to perform a new task without being given any example of what success looks like.