What if we trained AI to complete equations instead of Images of Cats?

Remember that shock of seeing some breakthrough for the first time?

… Shockingly plausible solutions?

That AI was ImageGPT from OpenAI where GPT stands for Generative Pretrained Transformer. Now surely I am not the only one who immediately went crazy with ideas about what to find a “plausible solution” to next.

I saw unfathomably big probability space reduced to a thin and focused slice.

And what’s more shocking. Reduced to thin slice by following a lot of very complex observed and learned rules. Notice how the position of the light source influenced lighting in the resulting generated pedestrian scene.

“Q: What is result of 1+1 and why”

“A: Result of 1+1 is 2 acording to rules seen in arxivurl1 and arxivurl2”

Advantages of training on all scientific peer-reviewed sources:
Learning to arrive at conclusions by observing thought process down to basic elements (Arxiv) not just conclusions themselves(Wikipedia).

I generated this Image completely with Gann AI. Just by pushing a button

There is no such thing as Sci-Fi. Just predictions.

After all is trained and tested by an increasingly more complex set of existing equations. Even going as far as training on all known competing unified theory proposals. One can then start asking for the completion of the ultimate theory itself.

What Shockingly plausible solutions will look now ?

What hidden rules will AI observe and apply from all the human scientific data and knowledge it was trained on? Will it complete the equation with a term describing the pattern it observed within data in some LHC paper?

Can AI learn new Math or Physics rules just by observation?

Here is an example of what gpt3 produced just for this article. Remember it was not taught anything. It was just shown a lot of mostly internet text and a few basic math examples. yet somehow it seems to have learned this …

Test output of Gpt3 Beta as of 6.march 2021

“With enough complexity there is computational utility in transformers… Yannic Kilcher“

Think of equations as observed shadows of some complex structure projected to our limited plane of understanding.

The brilliant Mathematician Antony Garrett Lisi was able to spot patterns within properties of known particles and mapped them to E8 lie group above. And thus was, in theory, be able to predict particles we had not even seen yet. Unbelievable breakthrough if his theory is one day complete. Right?

Equations are just language too with equally learnable rules.

If it can learn known Math rules just by observation.
Than it can learn unknown Math rules as well.

The multidimensional nature of AI weights so far most definitely proven to be able to observe and learn hidden rules of these higher dimensional constructs to project new construct shadows /equations.

The current Juggernauts of language models.

OpenAI 175. Google claims 600 billion parameters but given us no API to test yet. To gain scalability Google didn’t go for the expensive more layers approach but opted for juggling between many more task-specific? ff networks. Reducing training time to 4 days on 2048tpus?

Mixture-of-experts-based models tend to significantly underperform monolithic (regular) models for the same number of parameters. ”Eleuther AI Faq”

Gpt3 undeniably produces often very fascinating answers on subjects that were frequently present in the training dataset. Heck, it even seems to have learned simple math since seen just simple math.

Model that observed multiple thought processes arriving at multiple conclusions is better then model that just observed multiple conclusions.

So what if we trained it on whole Arxiv.org but including complex Equations that are often 50% of content or more?


We are getting there fast. Thanks to EleutherAI excellent opensource gpt3 clone model Gpt-Neo and dataset “The Pile” was recently born being trained on a large chunk of Arxiv and DeepMind Math examples. Yay ;D

Mine Approach to dataset would be completely different from currently common …

Lets just throw at it a lot of random text

You would not start teaching a child high school math without basic math first.

If there is something as new information density then not all training text is equal.

For example.

So new info density and order are extremely important.

in order to be able to unpack and understand and extract this new higher-level information

Question: Solve -42*r + 27*c = -1167 and 130*r + 4*c = 372 for r.
Answer: 4

Question: Calculate -841880142.544 + 411127.
Answer: -841469015.544

Question: Let x(g) = 9*g + 1. Let q(c) = 2*c + 1. Let f(i) = 3*i - 39. Let w(j) = q(x(j)). Calculate f(w(a)).
Answer: 54*a - 30

Exciting times

Of course. I immediately went crazy and proposed this to Arxiv OpenAI and Google in incoherent and overexcited mail ;D Like this article is.

Because sometimes even the smallest Spark can lead to Big Fire.

And yes. I know. There is a lot everyone has on their plate nowadays. And I also know it’s not like you flick a switch and done. But when you think about it more. It can use already existing text-only toolchains…


Indeed that’s how 76GB arxiv dataset was meanwhile born. See this paper

  • Will these now more scientifically insightful completions inspire new directions or points of view in math and physics if all had access to it?
  • Will advancements in one field/paper now more easily propagate to others due to the network proposing them where it sees opportunity? I.e. first truly multidisciplinary insight?
  • Will output now always contain at least some hint of the solution? Thus reducing search space of problems we are usually trying to solve with brute force? Often after working on the equation for decades just getting a hint meant breakthrough.
  • Will it be able to propose equations found in arbitrary observed data. I.e. set of numbers in context simply because it seen similar behavior described or proposed by some paper? LHC data?
  • Will it allow faster search for solutions in a sudden global crisis?
  • What will feature visualization of the network that is now able to understand equations look like? Noise? Patterns? Fractals?
  • Can something like Disentangled Variational Autoencoder (beta-vae?) be able to detect extract and visualize new hidden Math or Physics fundamentals from such networks’ latent space. If the set of proper successive equations are asked?
  • If OpenAI Clip was trained on such a dataset. What would its weirdly category-specific neurons actually isolate and represent?

We should dream BIG to be exited about every next morning.

This article is republished from Medium by Ladislav Nevery.

Previous ABB Technology Enables World’s Largest Civilian Hospital Ship To Access Even The Most Challenging Harbors
Next Want To Build Your Own Smarter Version Of Siri?