Why it’s impressive that an AI can play Stratego

A new AI dubbed “DeepNash” has mastered Stratego, one of the few legendary board games in which computers don’t regularly beat up human players, according to a paper published this week. It’s a huge and surprising result – at least for the Stratego community.

Stratego is a game with two distinct challenges: it requires long-term strategic thinking (like chess) and also requires players to deal with imperfect information (like poker). The goal is to move across the board and capture the other player’s flag piece. Each game takes place on a 10 x 10 grid board with two 2 x 2 square lakes blocking the center of the board. Both players have 40 pieces with different tactical values ​​that can be used at the beginning of the game – the catch is that you can’t see what your opponent’s pieces are and they can’t see what yours are. When planning an attack, you don’t know if the defender is a high-ranking marshal who will beat almost all of your pieces, or a low-ranking sergeant who can be taken out by a lieutenant or captain. Some of the other playable parts are bombs (powerful but immobile), scouts (which can move to more than one square at a time), and miners (which can defuse bombs), all of which add to the tactical complexity. The game does not end until a player’s flag marker has been captured or they can no longer make any legal moves.

All of this means that Stratego creates a unique computer challenge to solve. Chess is relatively easy because all information is visible to everyone – in game theory one speaks of a “perfect information game”. A computer can look at your defense, simulate about 10 moves for a few different options, and pick the best one. It gives them a serious strategic advantage over even the best human players. It also helps that chess is a game won or lost in a few key moments rather than through gradual pressure. The average chess game takes around 40 moves, while Stratego takes over 380. This means that every move matters a lot (and deserves a lot more attention for humans), while Stratego is faster and more flexible.

[Related: Meta’s new AI can use deceit to conquer a board game world]

Stratego, on the other hand, is an “imperfect information game”. Until an opposing character attacks or is attacked, you have no way of knowing what it is. In poker, an imperfect information game that computers have been able to play at a high level for years, there are 10^164 possible game states and each player has only 10^3 possible two-card starting hands. In Stratego, there are 10^535 possible states and more than 10^66 possible deployments – meaning there’s a lot more unknown information to consider. And that adds to the strategic challenges.

Taken together, the two challenges make Stratego particularly difficult for computers (or AI researchers). According to the team, it is “not possible to use state-of-the-art model-based perfect information planning techniques or state-of-the-art imperfect information search techniques that break the game down into independent situations.” be available include.

But DeepNash made it. Researchers used a novel method that allowed the AI ​​to self-learn to play Stratego while developing its own strategies. It used a model-reinforcing learning algorithm called Regularized Nash Dynamics (R-NaD) in combination with a deep neural network architecture that strives for a Nash equilibrium – “an unexploitable strategy in two-player zero-sum games” like Stratego – and by doing so, he could “learn the qualitative behavior that one could expect from a top player”. This is an approach previously used in simple Prisoners Dilemma-style games, but never before in a game as complex as this one.

DeepNash has been tested against the best existing Stratego bots and experienced human players. It beat all other bots and was hard fought against the experienced humans on Gravon, an online board game platform. Even better, from a qualitative point of view, it was able to play well. It could compromise between taking material and disguising its characters’ identities, performing bluffs, and even making calculated bets. (Though the researchers also agree that terms like “deception” and “bluff” may well refer to mental states that DeepNash is incapable of.)

All in all, it’s an exciting demonstration of a new way to train AI models to play games (and perhaps perform other similar tasks in the future) – and it doesn’t rely on computationally intensive deep-search strategies previously used for gaming Other games used were chess, go and poker.

Leave a Reply

Your email address will not be published. Required fields are marked *