Monte Carlo Tree Search Tic-Tac-Toe -- Poor Agent

Monte Carlo Tree Search Tic-Tac-Toe -- Poor Agent - python

I'm trying to implement Monte Carlo tree search to play tic-tac-toe in Python. My current implementation is as follows:
I have a Board class that handles alterations to the tic-tac-toe board. The state of the board is represented by a 2x3x3 numpy array, where each of the 2 3x3 matrices are binary matrices individually representing the presence of X's and the presence of O's.
class Board:
'''
class handling state of the board
'''
def __init__(self):
self.state = np.zeros([2,3,3])
self.player = 0 # current player's turn
def copy(self):
'''
make copy of the board
'''
copy = Board()
copy.player = self.player
copy.state = np.copy(self.state)
return copy
def move(self, move):
'''
take move of form [x,y] and play
the move for the current player
'''
if np.any(self.state[:,move[0],move[1]]): return
self.state[self.player][move[0],move[1]] = 1
self.player ^= 1
def get_moves(self):
'''
return remaining possible board moves
(ie where there are no O's or X's)
'''
return np.argwhere(self.state[0]+self.state[1]==0).tolist()
def result(self):
'''
check rows, columns, and diagonals
for sequence of 3 X's or 3 O's
'''
board = self.state[self.player^1]
col_sum = np.any(np.sum(board,axis=0)==3)
row_sum = np.any(np.sum(board,axis=1)==3)
d1_sum = np.any(np.trace(board)==3)
d2_sum = np.any(np.trace(np.flip(board,1))==3)
return col_sum or row_sum or d1_sum or d2_sum
I then have a Node class, which handles properties of nodes as the search tree is being built:
class Node:
'''
maintains state of nodes in
the monte carlo search tree
'''
def __init__(self, parent=None, action=None, board=None):
self.parent = parent
self.board = board
self.children = []
self.wins = 0
self.visits = 0
self.untried_actions = board.get_moves()
self.action = action
def select(self):
'''
select child of node with
highest UCB1 value
'''
s = sorted(self.children, key=lambda c:c.wins/c.visits+0.2*sqrt(2*log(self.visits)/c.visits))
return s[-1]
def expand(self, action, board):
'''
expand parent node (self) by adding child
node with given action and state
'''
child = Node(parent=self, action=action, board=board)
self.untried_actions.remove(action)
self.children.append(child)
return child
def update(self, result):
self.visits += 1
self.wins += result
Finally, I have UCT function which pulls everything together. This function takes a Board object and builds the Monte Carlo search tree to determine the next best possible move from the given board state:
def UCT(rootstate, maxiters):
root = Node(board=rootstate)
for i in range(maxiters):
node = root
board = rootstate.copy()
# selection - select best child if parent fully expanded and not terminal
while node.untried_actions == [] and node.children != []:
node = node.select()
board.move(node.action)
# expansion - expand parent to a random untried action
if node.untried_actions != []:
a = random.choice(node.untried_actions)
board.move(a)
node = node.expand(a, board.copy())
# simulation - rollout to terminal state from current
# state using random actions
while board.get_moves() != [] and not board.result():
board.move(random.choice(board.get_moves()))
# backpropagation - propagate result of rollout game up the tree
# reverse the result if player at the node lost the rollout game
while node != None:
result = board.result()
if result:
if node.board.player==board.player:
result = 1
else: result = -1
else: result = 0
node.update(result)
node = node.parent
s = sorted(root.children, key=lambda c:c.wins/c.visits)
return s[-1].action
I've scoured this code for hours and simply can't find the error in my implementation. I've tested numerous board states and pitted two agents against each other, but the function returns poor actions for even the most simple of board states. What am I missing and/or what is wrong with my implementation?
edit: here is an example of how two agents might be implemented to play:
b = Board() # instantiate board
# while there are moves left to play and neither player has won
while b.get_moves() != [] and not b.result():
a = UCT(b,1000) # get next best move
b.move(a) # make move
print(b.state) # show state

The problem appears to be as follows:
Your get_moves() function does not check if the game is already over. It can generate a non-empty list of moves for states where someone has already won.
When creating a new Node, you also don't check if the game state is already over, so a non-empty list of untried_actions is created.
In the Selection and Expansion phases of the algorithm, you also don't check for terminal game states. Once the Expansion phase hits a node that contains a game state where someone already won, it will happily apply an extra action and create a new node for the tree again, which subsequent Selection phases will also happily keep going through.
For these nodes where the game continues being played after someone already won, result() can return an incorrect winner. It simply checks if the most recent player to make a move won, which is correct if you stop playing as soon as someone wins, but can be incorrect if you keep playing after someone already won. So, you propagate all kinds of incorrect results through the tree.
The simplest way to fix this will be to modify get_moves() such that it returns an empty list when the game is already over. Then, these nodes will always fail the if node.untried_actions != [] check, which means the expansion phase gets skipped altogether, and you move straight on to the Play-out phase where there is a proper check for terminal game states. This can be done as follows:
def get_moves(self):
"""
return remaining possible board moves
(ie where there are no O's or X's)
"""
if self.result():
return []
return np.argwhere(self.state[0] + self.state[1] == 0).tolist()

Related

Problem setting the best move in the Negamax algorithm

Hey i am tring to make a chess engine but when i run the code it plays fine, the problem is sometimes it does not set the best move and it shows an error becuase the search did not return a move to play. My negamax includes AB pruning, MD Pruning, QSearch, ID, and TT Tables. Here is my implementation.
Entry=namedtuple('Entry', 'score depth flag')
class Search():
def __init__(self):
self.nodes=0
self.tt_score={}
def search(self,position,api):
self.nodes=0
for depth in range(1,1000):
ret=self.negamax(position,api,-INFINITY,INFINITY,depth,ply=1)
yield ret
def negamax(self,position,api,alpha,beta,depth=3,ply=1):
best,bmove=-INFINITY,()
for move in position.moves():
score=self.negasearch(position.move(move),-beta,-alpha,depth-1,ply+1)
if score>=beta: return [move,score,self.nodes]
if score>best:
best,bmove=score,move
#api.arrow("clear")
#api.arrow(coordinates[move[0]]+coordinates[move[1]])
if score>alpha: alpha=score
return [bmove,best,self.nodes]
def negasearch(self,position,alpha,beta,depth,ply):
best,aorig=-INFINITY,alpha
self.nodes+=1
if MATE-ply<beta:
beta=MATE-ply
if alpha>=MATE-ply: return MATE-ply
if -MATE+ply>alpha:
alpha=-MATE+ply
if beta<=-MATE+ply: return -MATE+ply
entry=self.tt_score.get(position.hash(),Entry(0,-1,'exact'))
if entry.depth>=depth:
if entry.flag=='exact': return entry.score
elif entry.flag=='lower': alpha=max(alpha,entry.score)
elif entry.flag=='upper': beta=min(beta,entry.score)
if alpha>=beta: return entry.score
if depth<=0: return self.qsearch(position,alpha,beta,ply+1)
for move in position.moves():
score=self.negasearch(position.move(move),-beta,-alpha,depth-1,ply+1)
if score>=beta:
best=score
break
if score>best:
best=score
if score>alpha: alpha=score
if best<=aorig: self.tt_score[position.hash()]=Entry(best,depth,'upper')
elif best>=beta: self.tt_score[position.hash()]=Entry(best,depth,'lower')
else: self.tt_score[position.hash()]=Entry(best,depth,'exact')
return best
def qsearch(self,position,alpha,beta,ply):
stand_pat=position.score
if stand_pat>=beta: return beta
if alpha<stand_pat: alpha=stand_pat
self.nodes+=1
for move in position.moves():
if move[2]==0: continue
score=self.qsearch(position.move(move),-beta,-alpha,ply+1)
if score>=beta: return beta
if score>alpha: alpha=score
return alpha
You probably have already noticed i did not negate the score each turn as it should be. Reason why is because when you call the move function it return the position with the score already negated. For example,
def move(self,move):
# copy the current position board, turn, and score
# if move is a capture then preform:
# score+=pvs[pieceBeingCaptured]
# preform move
# if the next player is mated after this move then set score to MATE
return Position(board,-score,self.turn*-1)
It is not the move generation because it returns the correct moves each position state. With the best value being -INFINITY and negamax it should always set the best move. Any help would be helpful.

Searching a node in two deques simultaneously and printing the values in the queues - Python 3

I have a code for a Breadth-first Search search algorithm.
According to the 3rd last line of the algorithm, we have to check if the child state is already available in both the queues before performing any further action. In my code, I have to make BFS algorithm to search for a path from start-point to end-point in a maze. I have made a node that saves state, actions, path and parent for a grid location. The code is as follows:
#class to initialize node
class Node:
state = []
actions = []
parent = []
path = 0
def __init__(self, state, actions, parent, path):
self.state = state
self.actions = actions
self.parent = parent
self.path = path
In the algorithm function, I am adding nodes from a start point to a deque in Python. But before adding I have to check for the condition that child.state should not be in frontier or explored queues. My condition in the code runs infinitely. It is as follows:
def bfs_search(maze_size, start_point, end_point, grid_values, number_of_grids):
#code for creating first node and adding to queue
while True:
#here we have the code to break the loop if queue gets empty and then finding the child
pointer and creating child notes
#this condition does not execute as it should
if child_node.state not in frontier and child_node.state not in explored:
if child_node.state == end_point:
goal = True
explored.appendleft(child_node)
break
else:
frontier.appendleft(child_node)
I also want to print the elements appended to the queue while testing the code to check if the nodes are being appended to the queue.

A mini max game search tree gives error: TypeError: '>' not supported between instances

I'd like to perform Mini-Max algorithm on 2048 game. In order to do so, I've first created a tree
class Tree(object):
def __init__(self, num_sons, data, parent=None):
self.data = [None, data]
self.sons = []
self.index_son = 0
self.parent = parent
for i in range(num_sons):
self.sons.append(None)
def add_son(self, son):
self.sons[self.index_son] = son
self.index_son += 1
def get_son(self, index):
return self.sons[index]
def is_terminal(self):
return self.index_son == 0
Basically every node has a different number of offsprings for every action possible. Actions of Max are the usual actions in the game of 2048, moving right up down or left and Actions of Min are the actions of creating additional 2's blocks in the game (whenever you don't make progress in 2048 there are additional 2's block created on the edges).
I've created the tree so the terminals hold the values.
Now I want to use the MiniMax algorithm to have the data initialized in every node of the tree.
Trying to this:
def minimax(self, root, player_turn):
if root.is_terminal():
return root.data[1]
else:
if player_turn % 2 == 0:
for i in range(root.index_son):
root.data[1] = max(root.data[1], self.minimax(root.get_son(i), 1))
if player_turn % 2 == 1:
for i in range(root.index_son):
root.data[1] = min(root.data[1], self.minimax(root.get_son(i), 0))
And I get an error
TypeError: '>' not supported between instances of 'NoneType' and 'float'
On this line:
root.data[1] = max(root.data[1], self.minimax(root.get_son(i), 1))
I know I don't return anything if node isn't a terminal but I don't know what to add in those cases.

Python: Name not defined error even though function is clearly defined before call [duplicate]

This question already has answers here:
function name is undefined in python class [duplicate]
(2 answers)
Closed 5 years ago.
I'm new to python and having a weird issue with function definitions. I have checked around the forums and made sure to define my function before calling it, however that has not helped the issue. I keep getting a name not defined error when I try to call literally function in this one particular method.
from eight_puzzle import Puzzle
import math
################################################################
### Node class and helper functions provided for your convience.
### DO NOT EDIT!
################################################################
class Node:
"""
A class representing a node.
- 'state' holds the state of the node.
- 'parent' points to the node's parent.
- 'action' is the action taken by the parent to produce this node.
- 'path_cost' is the cost of the path from the root to this node.
"""
def __init__(self, state, parent, action, path_cost):
self.state = state
self.parent = parent
self.action = action
self.path_cost = path_cost
def gen_child(self, problem, action):
"""
Returns the child node resulting from applying 'action' to this node.
"""
return Node(state=problem.transitions(self.state, action),
parent=self,
action=action,
path_cost=self.path_cost + problem.step_cost(self.state, action))
#property
def state_hashed(self):
"""
Produces a hashed representation of the node's state for easy
lookup in a python 'set'.
"""
return hash(str(self.state))
################################################################
### Node class and helper functions provided for your convience.
### DO NOT EDIT!
################################################################
def retrieve_solution(node,num_explored,num_generated):
"""
Returns the list of actions and the list of states on the
path to the given goal_state node. Also returns the number
of nodes explored and generated.
"""
actions = []
states = []
while node.parent is not None:
actions += [node.action]
states += [node.state]
node = node.parent
states += [node.state]
return actions[::-1], states[::-1], num_explored, num_generated
################################################################
### Node class and helper functions provided for your convience.
### DO NOT EDIT!
################################################################
def print_solution(solution):
"""
Prints out the path from the initial state to the goal given
a tuple of (actions,states) corresponding to the solution.
"""
actions, states, num_explored, num_generated = solution
print('Start')
for step in range(len(actions)):
print(puzzle.board_str(states[step]))
print()
print(actions[step])
print()
print('Goal')
print(puzzle.board_str(states[-1]))
print()
print('Number of steps: {:d}'.format(len(actions)))
print('Nodes explored: {:d}'.format(num_explored))
print('Nodes generated: {:d}'.format(num_generated))
################################################################
### Skeleton code for your Astar implementation. Fill in here.
################################################################
class Astar:
"""
A* search.
- 'problem' is a Puzzle instance.
"""
def __init__(self, problem):
self.problem = problem
self.init_state = problem.init_state
self.num_explored = 0
self.num_generated = 1
def selectState(self, listOfStates):
'''
Selects the loweset cost node for expansion based on f(n) = g(n) + h(n)
'''
lowestCostPath = listOfStates[0].path_cost
index = int(1)
lowestNodeIndex = int(0)
while index != len(listOfStates):
scannedPathCost = listOfStates[index].path_cost
if index < scannedPathCost:
lowestCostPath = scannedPathCost
lowestNodeIndex = index
index += 1
return listOfStates[lowestNodeIndex]
def f(self,node, method):
'''
Returns a lower bound estimate on the cost from root through node
to the goal.
'''
return node.path_cost + self.h(node, method)
def getManhattanDistance(self, node):
'''
Evaluates the manhattan distance for a given state
'''
iterator = int(0)
misplacedCount = int(0)
totalDistance = int(0)
while iterator != len(node.state):
if iterator != node.state[iterator] and node.state[iterator] != 0:
misplacedCount = misplacedCount + 1
xCurrent = int(node.state[iterator]/3)
yCurrent = int(node.state[iterator]%3)
xDesired = int(iterator/3)
yDesired = int(iterator%3)
totalDistance = totalDistance + int(abs(xCurrent - xDesired)) + int(abs(yCurrent - yDesired))
iterator = iterator + 1
return totalDistance + misplacedCount
def h(self,node, method='man'):
'''
Returns a lower bound estimate on the cost from node to the goal
using the different heuristics.
'''
################################################################
### Your code here.
################################################################
if method == 'man':
self.getManhattanDistance(node)
return -1
elif method == 'rowcol':
return -1 # compute rowcol heuristic
elif method == 'misplaced':
return -1 # compute misplaced tiles the number of tiles out of place
elif method == 'null':
return -1 # compute null heuristic
else:
return 0
def method_stats(self, board, trials=100, method='man'):
'''
Returns an mean and standard deviation of the number of nodes expanded
'''
# write code here to randomly generate puzzles and
# compute the mean and standard deviation of the number
# nodes expanded. You can use np.mean() and np.std()
expanded_mean = 0.
expanded_std = 0.
for t in range(trials):
puzzle = Puzzle(board).shuffle()
solver = Astar(puzzle)
actions, states, num_explored, num_generated = solver.solve(method=method)
############################################################
### Compute upper bound for branching factor and update b_hi
### Your code here.
############################################################
return expanded_mean, expanded_std
def anotherFunction(self, node, method):
return 1
def generateStatesFor(self, node, method, listOfStates):
'''
Decides how to select an action from a list of available actions
'''
def solve(self, method='man'):
node = Node(state = self.init_state,
parent = None,
action = None,
path_cost = 0)
num_explored = int(0)
num_generated = int(0)
listOfStates = []
listOfStates.append(node)
print(listOfStates[0].state)
anotherFunction(self, node, method)
return retrieve_solution(node, num_explored=num_explored, num_generated=num_generated)
if __name__ == '__main__':
# Simple puzzle test
## board = [[3,1,2],
## [4,0,5],
## [6,7,8]]
board = [[7,2,4],
[5,0,6],
[8,3,1]]
puzzle = Puzzle(board)
solver = Astar(puzzle)
solution = solver.solve()
##print_solution(solution)
# Harder puzzle test
board = [[7,2,4],
[5,0,6],
[8,3,1]]
puzzle = Puzzle(board)
solver = Astar(puzzle)
##solution = solver.solve()
##print(len(solution[0]))
# branching factor test
method='man'
emean, estd = solver.method_stats(board, trials=100, method=method)
##print('mean and standard deviation: {0:.2f}, {1:.2f} using heuristic: {2}'.format(emean, estd, method))
The Error code:
Traceback (most recent call last): File "/Users/-/Downloads/HW1 2/code/my_hw1.py", line 214, in <module>
solution = solver.solve() File "/Users/-/Downloads/HW1 2/code/my_hw1.py", line 200, in solve
anotherFunction(self, node, method) NameError: name 'anotherFunction' is not defined [Finished in 0.1s with exit code 1]
As you can see, the function calling it is on line 200 and the function is defined at line 185. Any idea what the issue could be? I am also able to call the exact same "anotherFunction" method from other methods which aren't solve. Any tips would be appreciated.

When you define a function with "self" as an argument, you need to call that function from the class in which it is defined. For example, if you have an instance of the class myClass where anotherFunction is defined, the syntax would be myClass.anotherFunction(node, method). The "self" argument in the definition indicates anotherFunction is a member function of whatever class it is defined in - I would need to see more of your code to know what class that is.

cx_Freeze Build of Tkinter Application Extremely Buggy

I recently created a small game using tkinter (python version 3.6.1) and froze it using cx_Freeze. The game has four buttons: an undo button, a restart button, a "find legal moves" button, and a "find best move button". The "find best move" button uses a shelve database to find the best move for the first three turns and a recursive function that traverses the move tree on the fly for the fourth turn and up. My code disables the buttons when they should not be used.
I made sure to include the necessary DLLs in the setup script and I was able to run the executable without errors. However, three of the buttons are disabled until the fourth turn (when the recursive function begins to be used) and the application is extremely buggy in many other ways. However, it works perfectly when I run the unfrozen version.
I honestly don't know what code snippets I would need to provide to you guys, as this issue has me utterly at a loss. The only clue I have is that the pyc files in the build differ in size from the unfrozen app. I know this is rather vague, but I do not know what specifics would be useful to give. Any help, if possible, would be greatly appreciated.
"Find best move" method:
def _find_best_move(self):
"""Finds best move possible for current game."""
if len(self.game.moves) <= 3:
with shelve.open("paths") as db:
best_paths = db[str(self.game.moves)]
best_path = choice(best_paths)
else:
self.path_finder(self.game)
best_path = self.path_finder.best_path
best_move = best_path[len(self.game.moves)]
best_move = (__class__._add_offset(best_move[0]), best_move[1])
return best_move
Updates Button State:
def update_gui(self):
"""Updates GUI to reflect current game conditions."""
legal_moves = self.game.find_legal_moves()
if self.game.moves:
self.undo_btn["state"] = "!disabled"
self.restart_btn["state"] = "!disabled"
self.best_move_btn["state"] = "!disabled"
else:
self.undo_btn["state"] = "disabled"
self.restart_btn["state"] = "disabled"
if legal_moves:
self.show_moves_btn["state"] = "!disabled"
else:
self.show_moves_btn["state"] = "disabled"
if legal_moves and self.game.moves:
self.best_move_btn["state"] = "!disabled"
else:
self.best_move_btn["state"] = "disabled"
My __init__ file:
initpath = os.path.dirname(__file__)
os.chdir(os.path.join(initpath, "data"))
PathFinder class (traverses move tree on the fly):
class PathFinder:
"""Provides methods to find move paths that meet various criteria.
Designed to be called after the player makes a move.
"""
_game = None
best_path = None
best_score = None
def __call__(self, game):
"""Call self as function."""
if not game:
self._game = DummyGame()
elif not isinstance(game, DummyGame):
self._game = DummyGame(game)
else:
self._game = game
moves = self._game.moves
self.possible_paths = dict.fromkeys(range(1,9))
root = Node(moves[-1])
self._find_paths(root)
self._find_paths.cache_clear()
found_scores = [score for score in self.possible_paths.keys() if
self.possible_paths[score]]
self.best_score = min(found_scores)
self.best_path = self.possible_paths[self.best_score]
#lru_cache(None)
def _find_paths(self, node):
"""Finds possible paths and records them in 'possible_paths'."""
legal_moves = self._game.find_legal_moves()
if not legal_moves:
score = self._game.peg_count
if not self.possible_paths[score]:
self.possible_paths[score] = self._game.moves.copy()
else:
children = []
for peg in legal_moves:
for move in legal_moves[peg]:
children.append(Node((peg, move)))
for child in children:
self._game.move(*child.data)
self._find_paths(child)
try:
self._game.undo()
except IndexError:
pass
Peg class:
class Peg(RawPen):
"""A specialized 'RawPen' that represents a peg."""
def __init__(self, start_point, graphics):
"""Initialize self. See help(type(self)) for accurate signature."""
self.graphics = graphics
self.possible_moves = []
super().__init__(self.graphics.canvas, "circle", _CFG["undobuffersize"],
True)
self.pen(pendown=False, speed=0, outline=2, fillcolor="red",
pencolor="black", stretchfactor=(1.25,1.25))
self.start_point = start_point
self.goto(start_point)
self.ondrag(self._remove)
self.onrelease(self._place)
def _remove(self, x, y):
"""Removes peg from hole if it has moves."""
if self.possible_moves:
self.goto(x,y)
def _place(self, x, y):
"""Places peg in peg hole if legal."""
if self.possible_moves:
target_holes = [tuple(map(add, self.start_point, move)) for move in
self.possible_moves]
distances = [self.distance(hole) for hole in target_holes]
hole_distances = dict(zip(distances, target_holes))
nearest_hole = hole_distances[min(hole_distances)]
if self.distance(nearest_hole) <= 0.45:
self.goto(nearest_hole)
peg = self.graphics._subtract_offset(self.start_point)
move = tuple(map(sub, self.pos(), self.start_point))
move = tuple(map(int, move))
self.graphics.game.move(peg, move)
self.start_point = self.pos()
else:
self.goto(self.start_point)

The frozen application is going to have a different value for __value__ then the unfrozen application. You will have to deal with that accordingly! This is a common issue that bites a lot of people. Anything that assumes that the module is found in the file system is going to stop working properly when frozen. The other gotcha is dynamic importing of modules.
The documentation covers this and other topics that will hopefully help you out!

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Monte Carlo Tree Search Tic-Tac-Toe -- Poor Agent - python

Related

Problem setting the best move in the Negamax algorithm

Searching a node in two deques simultaneously and printing the values in the queues - Python 3

A mini max game search tree gives error: TypeError: '>' not supported between instances

Python: Name not defined error even though function is clearly defined before call [duplicate]

cx_Freeze Build of Tkinter Application Extremely Buggy

Categories

Resources