Reinforcement learning and reasoning

Reinforcement learning has seen a lot of progress in recent years. From DeepMind success with teaching machines how to play Atari games, then AlphaGo beating world champions in Go to recent OpenAI’s progress on Dota 2, a multiplayer game where players divided into two teams compete with each other. The common thread is an artificial agent operating in a virtual world, where the prize is clear (e.g. win the game), but strategies to attain this prize are complex and evolving with responses of other players.

On the other hand people are experimenting with AI agents operating in real-world. Each clip of Boston Dynamics gets a lot of press, showing robots performing amazing stunts, as you can see yourself here or here. OpenAI has a structured approach to teaching robots particular skills which demand dexterity, like manipulating objects in a hand.


1. Machine reasoning

Natural question is what will be next in the development of AI and reinforcement learning in particular. Classical machine learning feeds on large datasets. With deep learning, smart architecture or both (cf. one-shot learning) you can sometimes reduce the amount and quality of data needed to make a machine learn a particular task, however the learning rate is still sub-optimal compared to humans. If one were to create an Artificial General Intelligence (AGI), a machine able to perform any task at least as well as an average human, then in particular one would need to overcome learning difficulties. Those difficulties arise in computing power, transfer learning and reinforcement learning. The reason why humans are able to learn on small datasets is two-fold:

The equivalent of the first point in AI is called transfer learning (or meta-learning), and currently is still largely underdeveloped. The same goes for reasoning with machine reasoning being in its infancy. Eventually all comes down to reasoning, with transfer learning being a relatively easier task — if you can reason, you can in particular classify and compare, and thus you are able to find similarities in tasks you want to approach, and then act upon it, by training for a new skill (of course, it’s much harder than that, but that’s another post).

Machine reasoning is a true bottleneck in the current development of AI. We can’t even understand how context works — just talk with any bot to verify it yourself — not to mention any harder tasks. So far logical reasoning was outside of scope of machine learning.

2. Mathematics

A perfect example of pure reasoning to test any machine reasoning capabilities is mathematics. Each proof of a theorem consists of many steps, logically building upon each other, often dependent on already proven facts. Mathematicians write their proofs in natural language, which is to some extent formal, but far from formalized (in the sense of formal languages). Validation of a proof is done by other mathematicians, fellow experts in a given field, after sending a paper to a scientific journal. This procedure has many drawbacks: not only it is slow, but it depends on other experts. Moreover proofs and arguments themselves have often many omissions, after all one doesn’t have to write down what’s widely accepted in a given community (branch of mathematics). Publishing incorrect proofs in prestigious journals is not unheard of.

My road to artificial intelligence started by asking whether this procedure can be made more reliable and much faster, especially as my domain of research, Langlands program, was extremely complicated. The first natural thing that comes to mind is that you should made all those proofs formal. That was already an idea of Voevodsky couple of years ago, but more or better formalization is not an answer. Nobody is going to force mathematicians to write their proofs more formally (as it would destroy the whole pleasure of doing research and is widely considered a waste of time). Actually formalizing mathematics is a whole field of mathematics with its own dogma and problems. People rewrite a proof in one of formal languages (Coq, Mizar, Isabelle to name a few) and then can validate the proof automatically. However the labour needed to formalize a proof in one of those languages is huge — for example formal verification of Feit-Thompson theorem took 6 years of collaborative effort. This is what caused me to think that if we are to succeed at formalization and verification of mathematics (a baby step towards reasoning), we need to automate the whole process. I’ve written my ideas down as DeepAlgebra program and kept on looking for how to make it out, with no real success so far (I should explain it in another post). That was before I learned more about reinforcement learning.

3. Reinforcement learning and mathematics

I started with reinforcement learning for a reason. I view mathematics as a game with a clear prize (proof of theorem) and a complex strategy (steps of the proof). The way mathematicians do mathematics is similar to how reinforcement learning works: they search for potential proofs, by going deeper and deeper into a tree of the game (all possible proofs), backtracking whenever their intuition (learned by arriving at a contradiction) is telling them to do so. They search for a right path, right strategy to arrive at the proof, by manipulating objects in the formulation of the theorem, introducing new entities, recalling already proven theorems and measuring their progress by evaluating how many more steps they would presumably need (i.e. how many lemmas/facts) to prove their theorem. Mathematicians generally have an intuition how far away they are from a solution. It comes once they worked on a problem for some time and explored a couple of paths.

Intuition of mathematicians is similar to an intuition that Go players have. They too explore a tree of the game (in their heads playing moves ahead and thinking about what an opponent might respond and where it leads), they have an intuition about ‘shapes’ (formation of stones on a board) just as mathematicians have an intuition about mathematical entities they introduce. Their goal is to win the game but the game is too complex to determine it before-hand, thus they need to evaluate their position on a board with each move to see whether they are winning or losing, and how that should influence their moves. Go is complex because it allows for local vs global phenomena, you can be losing on one side of the board, but you still win overall. You have the same phenomena in mathematics: ‘local’ are lemmas and already proven theorems, ‘global’ is the theorem you want to prove right now. Strategies play a crucial part in both games as they allow you to have a general direction for approaching the game.

Overall it seems like reinforcement learning methods similar to those used in AlphaGo can help in approaching mathematical proofs. It also seems as a natural next step for machine learning research in terms of difficulties arising.

4. Conclusion

In order to develop AGI, or at least come closer to building real intelligence, we must teach machines how to reason. Mathematics is a perfect test ground for this task with huge online repositories of proofs and a formal way of reasoning. Moreover its similarities to playing a game, make it a good target for trying out reinforcement learning techniques. It’s the right time to start thinking on how to incorporate reasoning into machine learning. If it succeeds, we will see truly next level of AI revolution.

CEO Contentyze, the text editor 2.0, PhD in maths, Forbes 30 under 30 — → Sign up for free at

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store