Nederlands English

CHIPS WITH EVERYTHING

Written by: Spelen-Play65 Team
Saturday, August 05, 2006



Much of the renewed interest in the game stems from its connection with computing technology: not only because backgammon is being packaged as a hot new form of online gaming or because a PC is a tireless and always-available opponent, but because backgammon has become one of the success stories of research into artificial intelligence (AI).

Self-taught backgammon-playing computer programs are so good that they have overturned many of the assumptions previously held about how the game should be played and which are the best moves, especially in the opening phases of the game.

Any player who trains by regularly playing against a backgammon program, using moves suggested by the program itself, can learn to play at levels that were previously attained by only a few top players. Unlike the top chess-playing computers and computer programs, which tend to use brute-force tree searching and rely on massive amounts of computing power in a hybrid hardware/software platform, any PC can run a world-class backgammon program.

Hydra, which is the world’s most powerful chess-playing computer and has never been beaten, is a cluster computer based on 32 x 3.06GHz Xeon processors, each paired with its own FPGA (Field Programmable Gate Array) device and capable of evaluating 200 million moves per second.

It is able to look 18 moves ahead, which is six more than IBM’s Big Blue. In contrast, the top backgammon programs, which are based on neural net techniques, will run happily on a bog-standard Pentium under Windows 95.

Learning by doing

Backgammon-playing programs are entirely self taught using the technique of reinforcement learning, which is one of the most promising avenues of research into artificial intelligence.

In essence, it’s no different from the way we train dogs, dishing out a reward when the animal does what we want it to and withholding the reward when it doesn’t.

In AI, the trainee or agent, is a neural network, and it is the aim of the agent to maximise the total amount of reward it receives. An agent won’t respond at all to a biscuit or a pat on the head, so the reward is a numerical one based on the agent’s most recent action.

What the agent must try to do is maximise the cumulative reward it receives for succeeding in its goal and not get hung up on the immediate reward for making one good decision.

The goal in any game is, of course, to win and it’s easy to score the outcome by awarding +1 points for a win, -1 points for losing and 0 points for a draw or uncompleted game.

An agent needs to understand the environment in which it operates and must know when it has achieved its goal, but in the case of backgammon this is incredibly simple.

A backgammon board is basically a one-dimensional race track split into 24 segments, with opponents racing in opposite directions and all checkers moving identically. A draw is impossible and the goal has been achieved when the agent gets all its checkers round the track before its opponent.

Timing is the key

The mode of reinforcement learning that has been so successful in teaching computers to play backgammon is called temporal difference (TD) learning, which is based on the differences between temporally successive predictions.

Each move by each notional player in the game (the computer plays both sides) is regarded as a time step, and there is a heuristic reward signal sent to the agent after each step and at the end of each game. The agent learns to predict the best move by adjusting the prediction at each time step to make it more closely match the prediction at the next time step. It is the difference between successive predictions which is the only measure of error, and the program is never explicitly instructed as to what is the best move.

Gerald Tesauro, an IBM researcher, is responsible for pioneering TD techniques with backgammon. His program, TD-Gammon, was developed after abandoning experiments with a supervised learning program called Neurogammon, in which the good and bad moves were hard-coded.

Neurogammon never reached an expert level of play, whereas TD-Gammon went on improving for 1,500,000 games and became a world-class player. Readers with long memories may recall that a version of TD-Gammon was included in the 1996 Family Funpak for OS2/Warp.

The next commercially available neural net program came in 1998, in the form of Fredrik Dahl’s Jellyfish, and this was soon followed by Olivier Egger’s Snowie. The current version of Snowie is regarded as the state of the art in terms of its playing skills and analysis tools, and it is priced accordingly.

However, there is a free alternative in the form of GNU Backgammon. This was the brainchild of Gary Wong, who by 1999 had drawn on the work of Tesauro and others to produce a neural net backgammon player called Costello.

He donated his code to the GNU Project, and GNU Backgammon (as it became known) is still under development. It plays an extremely strong game and has not stopped learning.

A version of it plays on the First Internet Backgammon Server (FIBS), where it ranks in the top 20 of over 6,000 players.



spelen-play65.com

Backgammon Play65
Backgammon Rules
Backgammon Basics
Backgammon Articles
Download play65
Gammon Empire
Backgammon Masters
Backgammon Webmasters
site map