Navigation

  • index
  • next |
  • previous |
  • cmpt310summer2019 documentation »

Table Of Contents

  • Adversarial Search
    • Note
    • Min/max Search
    • Chess, Go, and AlphaZero

Previous topic

Notes on Chapter 4: Beyond Classical Search

Next topic

Notes on Chapter 6: Constraint Satisfaction Problems

Quick search

Adversarial Search¶

Note¶

we’re not going to cover the algorithms in this chapter in significant detail

they are mostly specific to turn-taking games like chess and Go

recent developments in reinforcement learning have dramatically altered the state of the start of game playing algorithms

while min/max alpha-beta search is still a good algorithm that powers many strong programs, reinforcement learning techniques have eclipsed it in notable games, like chess and Go

Min/max Search¶

in games like chess, Go, poker, Connect-4, etc., adversarial search techniques have traditionally been used to create intelligent computer players for these games

assuming a 2-player game like chess, the basic idea is to represent the game as a game tree, where nodes in the tree alternate between one players move and the others

the players are often generically named MIN and MAX, corresponding to how they want to influence the objective function

  • MIN wants to minimize the objective function, while MAX wants to maximize it

the MINIMAX algorithm is the basic search techniques that computers use for making moves

the basic idea is to evaluate the leaves of the game tree, and then to work upwards from those using the minimax rule to assign values to the nodes for each player

in practice, MINIMAX search is usually implement with alpha-beta pruning, an effective technique for efficiently discarding certain parts of the game tree

dozens of other tricks, techniques, and ideas have been tried and tested over the decades of research into these algorithms, so that today the best adversarial search algorithms are highly optimized and efficient

Chess, Go, and AlphaZero¶

until about 2017, variations of alpha-beta search were the best chess-playing agents

they essentially combine a lot of human-added chess knowledge with an extremely efficient alpha-beta search

but AlphaZero surprised the chess world by showing that reinforcement learning could be used to learn to play chess at a level comparable to the best alpha-beta style chess agents

furthermore, AlphaZero did its learning in less than 10 hours of self-play (i.e. playing thousands and thousands of games against itself)

  • although, to be fair, it did use a huge amount of computational power in those few hours, so much that only a big company like Google could realistically do it

also, AlphaZero does use some searching, but it does not use alpha-beta search

instead, it uses a different technique called Monte-Carlo Tree search (MCTS)

  • the basic idea of MCTS is to pick random successor states, and keep going until the game is played out
  • this is repeated thousands (or more) times
  • then the move that is actually selected is the one that leads to the most wins
  • AlphaZero uses a modified version of this: it uses its learned knowledge (in the form of a neural network) to decide what move to make when simulating the games

consider the following news report from April 2019:

  • AlphaZero played 1000 chess games against StockFish, generally considered to be the strongest (or among the strong) traditional alpha-beta style chess agents
    • the games gave each player 3 hours to play their moves, plus 15 seconds extra for each move
  • AlphaZero won 155 of the games, lost 6, and drew the remaining 839
  • this is generally taken as good evidence that AlphaZero is probably the strongest chess-playing agent that has ever existed, human or otherwise
  • using MCTS, AlphaZero searched about 60,000 positions per second
  • using alpha-beta search, StockFish searched about 60 million positions per second, i.e. 1000 more positions per second than AlphaZero
  • even though StockFish was greatly out-searching AlphaZero, it appears that AlphaZero’s knowledge of chess is superior
  • Stockfish has an ELO rating around 3378, and based on this performance AlphaZero has an ELO rating around 3430
    • the higest ever ELO rating rating of a human chess player was for Magnus Carlsen, who had a rating of 2882 in2014
      • this is no match for computers: computers are now much better chess players than any human

as impressive as this is, another remarkable fact about AlphaZero is that it is not specific to chess: it has also been used to create the best Go and Shogi agents

  • famously, Go is a game that the alpha-beta game tree search approach doesn’t work well with because the branching factor of the game tree is so big

the dramatic success of AlphaZero has sparked the imagination of many people: if a computer can learn more about chess in ten hours than humans have learned in centuries, how long before computers start to similarly out-perform humans in other disciplines?

  • but there is still a lot of tough problems to solve — reinforcement learning doesn’t work so well with all games or problems!

Navigation

  • index
  • next |
  • previous |
  • cmpt310summer2019 documentation »
© Copyright 2019, Toby Donaldson. Created using Sphinx 1.7.9.