Polygames is a new open-source AI research framework for training agents to master strategy games through self-play, rather than by studying extensive examples of successful gameplay. Because it is more flexible and has more features than previous frameworks, Polygames can help researchers with advancing and benchmarking a broad range of zero learning (ZL) techniques that don’t require training data sets.
Polygames’ architecture makes it compatible with more kinds of games — including Breakthrough, Hex, Havannah, Minishogi, Connect6, Minesweeper, Mastermind, EinStein würfelt nicht!, Nogo, and Othello — than previous systems, such as AlphaZero and ELF OpenGo. In addition to building and evaluating ZL methods across a variety of games, Polygames allows researchers to study transfer learning, meaning the applicability of a model trained on one game to succeed at others. Polygames provides a library of included games, as well as a single-file API to implement your own game. We demonstrated the effectiveness of Polygames as a training tool with strong model performances in various game competitions, including producing the first bot to beat a top-tier human player in the game 19×19 Hex. In addition to sharing our approach to building Polygames, we are open-sourcing the full framework, which is available on GitHub.
What it does
While most AI systems learn to master tasks through training on a carefully curated data set of past examples of success — such as processing the moves that led to various winning games of chess — ZL techniques force systems to learn without large quantities of task-specific examples. Similar to self-supervised learning methods, ZL has the long-term potential to reduce the need for resource-intensive training data sets. Polygames advances on previous similar frameworks in several important ways:
Models are able to take into account the spatial structure of a given action space, and thus learn the related task more quickly, because they use fully convolutional networks, whose layers are all convolutional. This is a departure from most game-based architectures, which also use fully connected layers.
This structure also enables models to train on one board size and then also perform well on bigger and smaller ones.
In our tournament mode, we retain a group of earlier models that performed well, to reduce the chances of catastrophic forgetting (also known as the red queen effect), in which systems forget how to win against earlier iterations of themselves.
Because Polygames’ models are incremental — the framework comes with a script for adding new layers and channels or increasing kernel width — they’re capable of warm start training, allowing the neural network to grow as it trains. This neuroplasticity speeds up the overall training process.
Polygames works with a wider range of games than similar frameworks, including single-player games such as Minesweeper and Mastermind, and stochastic games, such as EinStein würfelt nicht!
Why it matters
Polygames’ flexible architecture increases the speed and versatility of previous ZL techniques, including the ability of models to generalize to more tasks and environments. For example, a model trained to work with a game that uses dice and provides a full view of the opposing player’s pieces can perform well at Minesweeper, which has no dice, a single player, and relies on a partially observable board. Our Polygames-trained models have also delivered winning results in a variety of game competitions and individual matches, including earning gold medals in the Breakthrough, Connect6, and Othello 10×10 categories at TAAI 2019 in Taiwan and beating a top-level human player at Hex. That Hex win — a first for a bot — demonstrated our framework’s versatility, since the model had been trained in a version of the game with a 13×13-space board, and was able to succeed using a larger, 19×19 board.
Polygames’ performance suggests the long-term potential of using ZL for real-world applications. For example, we’ve already used the framework to tackle mathematics problems related to Golomb rulers, which are used to optimize the positioning of electrical transformers and radio antennae. With its open design and compatibility with additional games, we look forward to seeing how other researchers use Polygames to advance the state of game-evaluated ZL techniques.
Get it on GitHub
Paper: Polygames: Improved zero learning
This research was the result of a large-scale collaboration among researchers at Facebook AI; Tristan Cazenave at Université Paris-Dauphine; Yen-Chi Chen at National Taiwan Normal University; Chen-Ling Li, Guan-Wei Chen, Hsin-I Lin, Maria Elsa, Shi-Cheng Ye, Shi-Jim Yen, Shi-Yu Chen, Xian-Dong Chiu, Yi-Jun Ye, and Yu-Jin Lin at National Dong Hwa University; and Julien Dehos and Fabien Teytaud at Université du Littoral Côte d’Opale.