Related
When i do the task - I pass all the tests except from the fact that my choice is not the same as the optimal. I dont understand why that is - or what i need to change in my code. I have tried everything, but there seeems to be a fundamental error somewhere in the algorith that makes my algorith choose Right even tho the best action is Left
def max_value(self, gameState, depth, agentIndex):
pacIndex = 0 #packmans index
bound = float('-inf') #create an upper bound
#check if we have a terminal state or leaf node
if gameState.isWin() or gameState.isLose() or depth == self.depth:
return self.evaluationFunction(gameState), None
if gameState.getNumAgents() == 0:
value, a = self.max_value(gameState, depth + 1,agentIndex)
#check for all childrennodes for packman
for action in gameState.getLegalActions(pacIndex):
#find each of the succesor from the action node
sucessor = gameState.generateSuccessor(pacIndex,action)
#now we minimize the next agent (ghost)
ghost_index = pacIndex + 1
value, result = self.min_value(sucessor,depth,ghost_index)
if value > bound: #if we have a higher one
bound = max(value,bound) #new boud, max of what we have found
maxresult = action
#return (bound,maxresult) [depth == 1]#return actions done for depth 1 now we move down
return bound, maxresult
def min_value (self, gameState, depth ,agentIndex):
bound = float('inf') #lower bound inifinity
minresult = None
agentNumber = gameState.getNumAgents() # find out whos turn it is
#check if we have terminal state
if gameState.isLose() or depth == self.depth:
return self.evaluationFunction(gameState), None
#get all the legal actions for the agent we are currently on
for action in gameState.getLegalActions(agentIndex):
#create the next nodes one step down
sucessor = gameState.generateSuccessor(agentIndex, action)
#check if there are ghosts left
if agentIndex == agentNumber-1:
#check if there are one the last node
#then we want to maximise for the packman #recccursive lol
pacIndex = 0
value, maxresult = self.max_value(sucessor,depth + 1,pacIndex)
else:
#for the ghosts
value, result = self.min_value(sucessor,depth,agentIndex+1)
if value < bound :
bound = min(value,bound)
minresult = action
return bound, minresult
#now the minmax function
def minimax(self,gameState):
depth = 0
a, maxresult = self.max_value(gameState, 0, 0)
return maxresult
```
I'm working on a Mancala game where players get to play against an AI. the code is complete, the Mancala game functionality is found within the Mancala_helpers, the AI algorithm is a MinMax tree and is found in the MinMax file, and finally the game itself. everything runs fine except for when the AI plays, if the AI starts the game immediately ends, it moves all the rocks from its pits in one round. and if I start I can only play one move before it does the same. I cannot understand what's happening, at first, I thought maybe I had a problem within the function of mancala helpers where they did not switch turns properly and the AI kept playing. but I ran multiple tests and that part is working fine. I cant identify the issue, help, please. if anyone also has suggestions for a better evaluation function then that would be great. thanks
--------------------------Mancala helpers--------------
# TODO: implement pad(num)
# Return a string representation of num that is always two characters wide.
# If num does not already have two digits, a leading "0" is inserted in front.
# This is called "padding". For example, pad(12) is "12", and pad(1) is "01".
# You can assume num is either one or two digits long.
def pad(num: int) -> str:
x = str(num)
if len(x) > 1:
return x
else:
return "0"+x
# TODO: implement pad_all(nums)
# Return a new list whose elements are padded versions of the elements in nums.
# For example, pad_all([12, 1]) should return ["12", "01"].
# Your code should create a new list, and not modify the original list.
# You can assume each element of nums is an int with one or two digits.
def pad_all(nums: list) -> list:
x = []
for i in nums:
x.append(pad(i))
return x
# TODO: implement initial_state()
# Return a (player, board) tuple representing the initial game state
# The initial player is player 0.
# board is list of ints representing the initial mancala board at the start of the game.
# The list element at index p should be the number of gems at position p.
def initial_state() -> tuple:
return (0, [4, 4, 4, 4, 4, 4, 0, 4, 4, 4, 4, 4, 4, 0])
# TODO: implement game_over(state)
# Return True if the game is over, and False otherwise.
# The game is over once all pits are empty.
# Your code should not modify the board list.
# The built-in functions "any" and "all" may be useful:
# https://docs.python.org/3/library/functions.html#all
def game_over(state: tuple) -> bool:
lst = state[1]
if (lst[0] == lst[1] == lst[2] == lst[3] == lst[4] == lst[5] == 0) or (lst[7] == lst[8] == lst[9] == lst[10] == lst[11] == lst[12] == 0):
return True
else:
return False
# TODO: implement valid_actions(state)
# state is a (player, board) tuple
# Return a list of all positions on the board where the current player can pick up gems.
# A position is a valid move if it is one of the player's pits and has 1 or more gems in it.
# For example, if all of player's pits are empty, you should return [].
# The positions in the returned list should be ordered from lowest to highest.
# Your code should not modify the board list.
def valid_actions(state: tuple) -> list:
actions = []
lst = state[1]
player = state[0]
if player == 0:
for i in range(6):
if lst[i] > 0:
actions.append(i)
return actions
else:
for i in range(6):
if lst[i+7] >0: actions.append(i+7)
return actions
# TODO: implement mancala_of(player)
# Return the numeric position of the given player's mancala.
# Player 0's mancala is on the right and player 1's mancala is on the left.
# You can assume player is either 0 or 1.
def mancala_of(player: int) -> int:
if player ==0: return 6
elif player==1: return 13
# TODO: implement pits_of(player)
# Return a list of numeric positions corresponding to the given player's pits.
# The positions in the list should be ordered from lowest to highest.
# Player 0's pits are on the bottom and player 1's pits are on the top.
# You can assume player is either 0 or 1.
def pits_of(player: int) -> list:
if player ==0:
return [0,1,2,3,4,5]
elif player==1:
return [7,8,9,10,11,12]
# TODO: implement player_who_can_do(move)
# Return the player (either 0 or 1) who is allowed to perform the given move.
# The move is allowed if it is the position of one of the player's pits.
# For example, position 2 is one of player 0's pits.
# So player_who_can_do(2) should return 0.
# You can assume that move is a valid position for one of the players.
def player_who_can_do(move: int) -> int:
if move in [0,1,2,3,4,5] : return 0
elif move in [7,8,9,10,11,12]: return 1
# TODO: implement opposite_from(position)
# Return the position of the pit that is opposite from the given position.
# Check the pdf instructions for the definition of "opposite".
def opposite_from(position: int) -> int:
d_p_1 = {}
d_p_1[0]=12
d_p_1[1]=11
d_p_1[2]=10
d_p_1[3]=9
d_p_1[4]=8
d_p_1[5]=7
d_p_1[7]=5
d_p_1[8]=4
d_p_1[9]=3
d_p_1[10]=2
d_p_1[11]=1
d_p_1[12]=0
return d_p_1[position]
# TODO: implement play_turn(move, board)
# Return the new game state after the given move is performed on the given board.
# The return value should be a tuple (new_player, new_board).
# new_player should be the player (0 or 1) whose turn it is after the move.
# new_board should be a list representing the new board state after the move.
#
# Parameters:
# board is a list representing the current state of the game board before the turn is taken.
# move is an int representing the position where the current player picks up gems.
# You can assume that move is a valid move for the current player who is taking their turn.
# Check the pdf instructions for the detailed rules of taking a turn.
#
# It may be helpful to use several of the functions you implemented above.
# You will also need control flow such as loops and if-statements.
# Lastly, the % (modulo) operator may be useful:
# (x % y) returns the remainder of x / y
# from: https://docs.python.org/3/library/stdtypes.html#numeric-types-int-float-complex
def play_turn(move: int, board: list) -> tuple:
player = player_who_can_do(move)
new_board = board
gems = new_board[move]
new_board[move] = 0
hasht = {}
hasht[0] =1
hasht[1] = 0
if player ==0:
x =0
offset = 1
gems_counter = gems
for i in range(gems):
if i + move + offset == 13: offset += 1
elif (i+move+offset) - 14 == 13: offset += 1
if i + move +offset > 13:
gem_position = (i+move+offset) - 14
else:
gem_position = i + move + offset
new_board[gem_position] += 1
gems_counter -= 1
if gems_counter ==0 and gem_position==6: x = 1
if gems_counter==0 and gem_position in pits_of(0) and new_board[gem_position] == 1 and new_board[opposite_from(gem_position)] > 0:
gems_from_myside = new_board[gem_position]
gems_from_opside = new_board[opposite_from(gem_position)]
new_board[6] = gems_from_myside+gems_from_opside
new_board[gem_position] = 0
new_board[opposite_from(gem_position)] = 0
return (hasht[x],new_board)
if player ==1:
x_2 = 1
offset = 1
gems_counter2 = gems
for i in range(gems):
if i + move + offset == 6: offset += 1
elif (i+move+offset) - 14 == 6: offset += 1
if i + move +offset > 13:
gem_position = (i+move+offset) - 14
else:
gem_position = i + move + offset
new_board[gem_position] += 1
gems_counter2 -= 1
if gems_counter2 == 0 and gem_position == 13: x_2 = 0
if gems_counter2==0 and gem_position in pits_of(1) and new_board[gem_position] == 1 and new_board[opposite_from(gem_position)] > 0:
gems_from_myside = new_board[gem_position]
gems_from_opside = new_board[opposite_from(gem_position)]
new_board[13] = gems_from_myside+gems_from_opside
new_board[gem_position] = 0
new_board[opposite_from(gem_position)] = 0
return (hasht[x_2],new_board)
# TODO: implement clear_pits(board)
# Return a new list representing the game state after clearing the pits from the board.
# When clearing pits, any gems in a player's pits get moved to that player's mancala.
# Check the pdf instructions for more detail about clearing pits.
def clear_pits(board: list) -> list:
length = len(board)
middle_index = length // 2
first_half = board[:middle_index]
second_half = board[middle_index:]
for i in range(6):
first_half[6] += first_half[i]
first_half[i]=0
second_half[6] += second_half[i]
second_half[i] = 0
return (first_half+second_half)
# This one is done for you.
# Plays a turn and clears pits if needed.
def perform_action(action, state):
player, board = state
new_player, new_board = play_turn(action, board)
if 0 in [len(valid_actions((0, new_board))), len(valid_actions((1, new_board)))]:
new_board = clear_pits(new_board)
return new_player, new_board
# TODO: implement score_in(state)
# state is a (player, board) tuple
# Return the score in the given state.
# The score is the number of gems in player 0's mancala, minus the number of gems in player 1's mancala.
def score_in(state: tuple) -> int:
lst = state[1]
return lst[6] - lst[13]
# TODO: implement is_tied(board)
# Return True if the game is tied in the given board state, False otherwise.
# A game is tied if both players have the same number of gems in their mancalas.
# You can assume all pits have already been cleared on the given board.
def is_tied(board: list) -> bool:
if board[mancala_of(0)] - board[mancala_of(1)] == 0: return True
else: return False
# TODO: implement winner_of(board)
# Return the winning player (either 0 or 1) in the given board state.
# The winner is the player with more gems in their mancala.
# You can assume it is not a tied game, and all pits have already been cleared.
def winner_of(board: list) -> int:
if board[mancala_of(0)] > board[mancala_of(1)]: return 0
elif board[mancala_of(0)] < board[mancala_of(1)]: return 1
# TODO: implement string_of(board)
def string_of(board: list) -> str:
new_board = pad_all(board)
return '\n {} {} {} {} {} {}\n {} {}\n {} {} {} {} {} {}\n'.format(new_board[12],new_board[11],new_board[10],new_board[9],new_board[8],new_board[7],new_board[13],new_board[6],new_board[0],new_board[1],new_board[2],new_board[3],new_board[4],new_board[5])
-----------------------MinMax AI-------------------------------------------------------------
from os import stat
import numpy as np
from mancala_helpers import *
# A simple evaluation function that simply uses the current score.
def simple_evaluate(state):
return score_in(state)
# TODO
# Implement a better evaluation function that outperforms the simple one.
def better_evaluate(state):
#lst = state[1]
#return score_in(state)/2
return None
# depth-limited minimax as covered in lecture
def minimax(state, max_depth, evaluate):
# returns chosen child state, utility
# base cases
if game_over(state): return None, score_in(state)
if max_depth == 0: return None, evaluate(state)
# recursive case
children = [perform_action(action, state) for action in valid_actions(state)]
results = [minimax(child, max_depth-1, evaluate) for child in children]
_, utilities = zip(*results)
player, board = state
if player == 0: action = np.argmax(utilities)
if player == 1: action = np.argmin(utilities)
return children[action], utilities[action]
# runs a competitive game between two AIs:
# better_evaluation (as player 0) vs simple_evaluation (as player 1)
def compete(max_depth, verbose=True):
state = initial_state()
while not game_over(state):
player, board = state
if verbose: print(string_of(board))
if verbose: print("--- %s's turn --->" % ["Better","Simple"][player])
state, _ = minimax(state, max_depth, [better_evaluate, simple_evaluate][player])
score = score_in(state)
player, board = state
if verbose:
print(string_of(board))
print("Final score: %d" % score)
return score
if __name__ == "__main__":
score = compete(max_depth=4, verbose=True)
----------------------------------------playing the game---------------------------
from os import stat
from mancala_helpers import *
from mancala_minimax import minimax, simple_evaluate
def get_user_action(state):
actions = list(map(str, valid_actions(state)))
player, board = state
prompt = "Player %d, choose an action (%s): " % (player, ",".join(actions))
while True:
action = input(prompt)
if action in actions: return int(action)
print("Invalid action, try again.")
if __name__ == "__main__":
max_depth = 1
state = initial_state()
while not game_over(state):
player, board = state
print(string_of(board))
if player == 0:
action = get_user_action(state)
state = perform_action(action, state)
else:
print("--- AI's turn --->")
#print(string_of(board))
print(state)
print(max_depth)
state, _ = minimax(state, max_depth, simple_evaluate)
#print(string_of(board))
player, board = state
print(string_of(board))
if is_tied(board):
print("Game over, it is tied.")
else:
winner = winner_of(board)
print("Game over, player %d wins." % winner)
the entire problem was in the mancala helper file. in the function play_turn() the 2nd line new_board = board. this was causing the issue because the original state should be immutable to work properly. any changes on new_board were also affecting board. the following new_board = copy.deepcopy(board) fixed everything. the function copy.deepcopy() creates a completely new copy, any changes applied to one of them does not affect the other.
I am working on learning q-tables and ran through a simple version which only used a 1-dimensional array to move forward and backward. now I am trying 4 direction movement and got stuck on controlling the person.
I got the random movement down now and it will eventually find the goal. but I want it to learn how to get to the goal instead of randomly stumbling on it. So I would appreciate any advice on adding a qlearning into this code. Thank you.
Here is my full code as it stupid simple right now.
import numpy as np
import random
import math
world = np.zeros((5,5))
print(world)
# Make sure that it can never be 0 i.e the start point
goal_x = random.randint(1,4)
goal_y = random.randint(1,4)
goal = (goal_x, goal_y)
print(goal)
world[goal] = 1
print(world)
LEFT = 0
RIGHT = 1
UP = 2
DOWN = 3
map_range_min = 0
map_range_max = 5
class Agent:
def __init__(self, current_position, my_goal, world):
self.current_position = current_position
self.last_postion = current_position
self.visited_positions = []
self.goal = my_goal
self.last_reward = 0
self.totalReward = 0
self.q_table = world
# Update the totoal reward by the reward
def updateReward(self, extra_reward):
# This will either increase or decrese the total reward for the episode
x = (self.goal[0] - self.current_position[0]) **2
y = (self.goal[1] - self.current_position[1]) **2
dist = math.sqrt(x + y)
complet_reward = dist + extra_reward
self.totalReward += complet_reward
def validate_move(self):
valid_move_set = []
# Check for x ranges
if map_range_min < self.current_position[0] < map_range_max:
valid_move_set.append(LEFT)
valid_move_set.append(RIGHT)
elif map_range_min == self.current_position[0]:
valid_move_set.append(RIGHT)
else:
valid_move_set.append(LEFT)
# Check for Y ranges
if map_range_min < self.current_position[1] < map_range_max:
valid_move_set.append(UP)
valid_move_set.append(DOWN)
elif map_range_min == self.current_position[1]:
valid_move_set.append(DOWN)
else:
valid_move_set.append(UP)
return valid_move_set
# Make the agent move
def move_right(self):
self.last_postion = self.current_position
x = self.current_position[0]
x += 1
y = self.current_position[1]
return (x, y)
def move_left(self):
self.last_postion = self.current_position
x = self.current_position[0]
x -= 1
y = self.current_position[1]
return (x, y)
def move_down(self):
self.last_postion = self.current_position
x = self.current_position[0]
y = self.current_position[1]
y += 1
return (x, y)
def move_up(self):
self.last_postion = self.current_position
x = self.current_position[0]
y = self.current_position[1]
y -= 1
return (x, y)
def move_agent(self):
move_set = self.validate_move()
randChoice = random.randint(0, len(move_set)-1)
move = move_set[randChoice]
if move == UP:
return self.move_up()
elif move == DOWN:
return self.move_down()
elif move == RIGHT:
return self.move_right()
else:
return self.move_left()
# Update the rewards
# Return True to kill the episode
def checkPosition(self):
if self.current_position == self.goal:
print("Found Goal")
self.updateReward(10)
return False
else:
#Chose new direction
self.current_position = self.move_agent()
self.visited_positions.append(self.current_position)
# Currently get nothing for not reaching the goal
self.updateReward(0)
return True
gus = Agent((0, 0) , goal)
play = gus.checkPosition()
while play:
play = gus.checkPosition()
print(gus.totalReward)
I have a few suggestions based on your code example:
separate the environment from the agent. The environment needs to have a method of the form new_state, reward = env.step(old_state, action). This method is saying how an action transforms your old state into a new state. It's a good idea to encode your states and actions as simple integers. I strongly recommend setting up unit tests for this method.
the agent then needs to have an equivalent method action = agent.policy(state, reward). As a first pass, you should manually code an agent that does what you think is right. e.g., it might just try to head towards the goal location.
consider the issue of whether the state representation is Markovian. If you could do better at the problem if you had a memory of all the past states you visited, then the state doesn't have the Markov property. Preferably, the state representation should be compact (the smallest set that is still Markovian).
once this structure is set-up, you can then think about actually learning a Q table. One possible method (that is easy to understand but not necessarily that efficient) is Monte Carlo with either exploring starts or epsilon-soft greedy. A good RL book should give pseudocode for either variant.
When you are feeling confident, head to openai gym https://www.gymlibrary.dev/ for some more detailed class structures. There are some hints about creating your own environments here: https://www.gymlibrary.dev/content/environment_creation/
I am trying to make an ultimate tic-tac-toe game in python which is a little different than the actual one in a way that this game ends when there is a win in any one sub-board. I am using minimax algorithm with alpha-beta pruning to find out the best move for the bot to play. The problem is that when i run the code and it is the time for bot to play its move, it runs endlessly without coming to a conclusion and returning a best_move.
The communication with the board is already handled. All i need is the best value and once i get that, i can retrieve the index from that state.
Initailly, once the game is started, the user is prompted to make a move from 1-9 which is then fed to the function:
Boards is a list of list which contains the state of each sub-board.
# choose a move to play
def play1(user_move):
# print_board(boards)
boards_list = main_boards.tolist()
player = 1
depth = 20
end_move = make_bot_move(boards_list, user_move, player, depth)
place(curr, end_move, 1)
return end_move
The make_bot_move function takes the position of the human and figures out in which sub-board it should play its best_move:
def make_bot_move(state, user_move, player, depth):
#sub_board = state[user_move]
# if suboptimal(state, user_move, player) != 0:
# return suboptimal(state, user_move, player)
pseudo_states = successors(state, player, user_move)
best_move = (-inf, None)
alpha = -inf
beta = inf
for x in pseudo_states:
state = x[0]
index = x[1]
val = minimax(state, index, opponent(player), depth-1, alpha, beta)
if val > best_move[0]:
best_move = (val, index)
# print("val = ", val)
# print_board(s[0])
return best_move[1]
The successors function returns the possible states where it can play its move:
def successors(boards, player, user_move):
sub_board = boards[user_move]
value_index = []
possible_states = []
for idx, value in enumerate(sub_board):
if value == 0 and idx != 0:
value_index.append(idx)
copied_board = deepcopy(boards)
possible_states.append(get_possible_state(copied_board, user_move, idx, player))
#print(possible_states)
return zip(possible_states, value_index)
Finally, every possible move is fed to minimax function which returns a val of the best move:
def minimax(state, last_move, player, depth, alpha, beta):
if depth <= 0 or get_game_status(state, player) != 0:
return evaluate(state, opponent(player))
if player == 1:
max_eval = -inf
pseudo_states = successors(state, player, last_move)
for x in pseudo_states:
state = x[0]
index = x[1]
print(depth)
#print_board(np.array(state))
eval = minimax(state, index, opponent(player), depth-1, alpha, beta)
max_eval = max(max_eval, eval)
alpha = max(alpha, eval)
if beta<= alpha:
break
#print_board(np.array(state))
return max_eval
if player == 2:
min_eval = inf
pseudo_states = successors(state, player, last_move)
for x in pseudo_states:
state = x[0]
index = x[1]
print(depth)
#print_board(np.array(state))
eval = minimax(state, index, opponent(player), depth - 1, alpha, beta)
min_eval = min(min_eval, eval)
beta = min(beta, eval)
if beta<= alpha:
break
#print_board(np.array(state))
return min_eval
To know whether someone has WON || LOSS || DRAW, get_game_status function is called inside minimax function:
def get_game_status(state, player):
other_player = opponent(player)
for each_box in state[1:10]:
win_state = [
[each_box[1], each_box[2], each_box[3]],
[each_box[4], each_box[5], each_box[6]],
[each_box[7], each_box[8], each_box[9]],
[each_box[1], each_box[4], each_box[7]],
[each_box[2], each_box[5], each_box[8]],
[each_box[3], each_box[6], each_box[9]],
[each_box[1], each_box[5], each_box[9]],
[each_box[3], each_box[5], each_box[7]],
]
if [player, player, player] in win_state:
return player
elif [other_player, other_player, other_player] in win_state:
return other_player
else:
return 0
And the scoring is handled using evaluate function:
def evaluate(state, player):
if(get_game_status(state, player) and player ==1) :
score = 10
elif(get_game_status(state, player) and player == 2):
score = -10
else:
score = 0
return score
The expected result is to get the best move but instead, it runs endlessly.
Kindly suggest what changes I should make, or where I am going wrong.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
I have a problem. I am trying to build an artificial intelligence game and I am encountering a problem. I copied a player and made an inheritance to the previous one (class Player(SelectiveAlphaBeta.Player)). For now the search in the game tree which I am making is the same (it only print maximum and minimum scores of each level – just for helping me choosing a right threshold). However it crashes in:
class Player(SelectiveAlphaBeta.Player):
def __init__(self, setup_time, player_color, time_per_k_turns, k):
SelectiveAlphaBeta.Player.__init__(self, setup_time, player_color, time_per_k_turns, k, 0.25) # TODO: w
def get_move(self, board_state, possible_moves):
self.clock = time.process_time()
self.time_for_current_move = self.time_remaining_in_round / self.turns_remaining_in_round - 0.05
if len(possible_moves) == 1:
return possible_moves[0]
current_depth = 1
prev_alpha = -INFINITY
# Choosing an arbitrary move:
best_move = possible_moves[0]
if (self.w < 1):
minimax = MiniMaxWithAlphaBetaPruningWithWDeepeningUntilRestfulness
(self.utility, self.color, self.no_more_time, self.w)
else: # self.w == 1
minimax = MiniMaxWithAlphaBetaPruning(self.utility, self.color, self.no_more_time)
time_last_move = 0;
print('debugger - line 1')
# Iterative deepening until the time runs out.
while True:
print('{} going to depth: {}, remaining time: {}, prev_alpha: {}, best_move: {}'.format(
self.__repr__(), current_depth, self.time_for_current_move - (time.process_time() - self.clock),
prev_alpha, best_move))
time_before = time.process_time()
time_left = self.time_for_current_move - (time.process_time() - self.clock);
# if (time_last_move <= time_left):
try:
print('debugger - line 2')
(alpha, move), run_time = run_with_limited_time(
minimax.search, (board_state, current_depth, -INFINITY, INFINITY, True), {},
time_left)
print('debugger - line 3')
except (ExceededTimeError):
print('no more time')
break
except (MemoryError):
print('no more memory')
break
# else:
# print('{} has no enough time ({}) left to go deeper'.format(self.__repr__(), time_left))
# break;
time_after = time.process_time()
time_last_move = time_after - time_before
if self.no_more_time():
print('no more time')
break
prev_alpha = alpha
best_move = move
if alpha == INFINITY:
print('the move: {} will guarantee victory.'.format(best_move))
break
if alpha == -INFINITY:
print('all is lost')
break
current_depth += 1
if self.turns_remaining_in_round == 1:
self.turns_remaining_in_round = self.k
self.time_remaining_in_round = self.time_per_k_turns
else:
self.turns_remaining_in_round -= 1
self.time_remaining_in_round -= (time.process_time() - self.clock)
return best_move
def utility(self, state):
return SelectiveAlphaBeta.Player.utility(self, state)
def no_more_time(self):
return SelectiveAlphaBeta.Player.no_more_time(self)
def __repr__(self):
return '{} {}'.format(abstract.AbstractPlayer.__repr__(self), 'SelectiveAlphaBetaWithRestfulness{}'.format(str(self.w)))
nothing is missing because this the signature of the function:
class MiniMaxWithAlphaBetaPruningWithW(MiniMaxWithAlphaBetaPruning):
def __init__(self, utility, my_color, no_more_time, w):
MiniMaxWithAlphaBetaPruning.__init__(self, utility, my_color, no_more_time)
self.w = w
def search(self, state, depth, alpha, beta, maximizing_player):
"""Start the MiniMax algorithm.
:param state: The state to start from.
:param depth: The maximum allowed depth for the algorithm.
:param alpha: The alpha of the alpha-beta pruning.
:param alpha: The beta of the alpha-beta pruning.
:param maximizing_player: Whether this is a max node (True) or a min node (False).
:return: A tuple: (The alpha-beta algorithm value, The move in case of max node or None in min mode)
"""
if depth == 0 or self.no_more_time():
return self.utility(state), None
next_moves = state.legalMoves()
if not next_moves:
# This player has no moves. So the previous player is the winner.
return INFINITY if state.currPlayer != self.my_color else -INFINITY, None
list = []
for next_move in next_moves:
if (self.no_more_time()):
del list[:]
return self.utility(state), None
new_state = copy.deepcopy(state)
new_state.doMove(next_move)
list.append((new_state, next_move, self.utility(new_state)))
list.sort(key=itemgetter(2))
if (self.no_more_time()):
del list[:]
return self.utility(state), None
if maximizing_player:
selected_move = next_moves[0]
best_move_utility = -INFINITY
for i in range(int(len(list)) - 1, int(len(list)) - int(len(list) * self.w) - 1, -1):
minimax_value, _ = self.search(list[i][0], depth - 1, alpha, beta, False)
alpha = max(alpha, minimax_value)
if minimax_value > best_move_utility:
best_move_utility = minimax_value
selected_move = list[i][1]
if beta <= alpha or self.no_more_time():
break
del list[:]
return alpha, selected_move
else:
for i in range(0, int(len(list) * self.w)):
beta = min(beta, self.search(list[i][0], depth - 1, alpha, beta, True)[0])
if beta <= alpha or self.no_more_time():
break
del list[:]
return beta, None
class MiniMaxWithAlphaBetaPruningWithWDeepeningUntilRestfulness(MiniMaxWithAlphaBetaPruning):
def __init__(self, utility, my_color, no_more_time, w):
MiniMaxWithAlphaBetaPruningWithW.__init__(self, utility, my_color, no_more_time, w)
# self.treshold_restfulness = TODO
def search(self, state, depth, alpha, beta, maximizing_player):
"""Start the MiniMax algorithm.
:param state: The state to start from.
:param depth: The maximum allowed depth for the algorithm.
:param alpha: The alpha of the alpha-beta pruning.
:param alpha: The beta of the alpha-beta pruning.
:param maximizing_player: Whether this is a max node (True) or a min node (False).
:return: A tuple: (The alpha-beta algorithm value, The move in case of max node or None in min mode)
"""
print('debugger - line 4')
if depth == 0 or self.no_more_time():
return self.utility(state), None
next_moves = state.legalMoves()
if not next_moves:
# This player has no moves. So the previous player is the winner.
return INFINITY if state.currPlayer != self.my_color else -INFINITY, None
list = []
for next_move in next_moves:
if (self.no_more_time()):
del list[:]
return self.utility(state), None
new_state = copy.deepcopy(state)
new_state.doMove(next_move)
list.append((new_state, next_move, self.utility(new_state)))
list.sort(key=itemgetter(2))
if (self.no_more_time()):
del list[:]
return self.utility(state), None
if maximizing_player:
selected_move = next_moves[0]
best_move_utility = -INFINITY
for i in range(int(len(list)) - 1, int(len(list)) - int(len(list) * self.w) - 1, -1):
minimax_value, _ = self.search(list[i][0], depth - 1, alpha, beta, False)
alpha = max(alpha, minimax_value)
if minimax_value > best_move_utility:
best_move_utility = minimax_value
selected_move = list[i][1]
if beta <= alpha or self.no_more_time():
break
print('Utility of best Move in deepening in depth of {} is {}'.format(depth, minimax_value))
del list[:]
return alpha, selected_move
else:
for i in range(0, int(len(list) * self.w)):
beta = min(beta, self.search(list[i][0], depth - 1, alpha, beta, True)[0])
if beta <= alpha or self.no_more_time():
break
del list[:]
return beta, None
The error message is:
Exception in thread Thread-6:
Traceback (most recent call last):
File "C:\Python34\lib\threading.py", line 921, in _bootstrap_inner
self.run()
File "C:\Python34\lib\threading.py", line 869, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\Eli\workspace\HW2\amazons\utils.py", line 36, in function_wrapper
result = func(*args, **kwargs)
TypeError: search() missing 1 required positional argument: 'maximizing_player'
for convenience the original player:
class Player(players.simple_player.Player):
def __init__(self, setup_time, player_color, time_per_k_turns, k, w):
players.simple_player.Player.__init__(self, setup_time, player_color, time_per_k_turns, k)
self.w = w;
def get_move(self, board_state, possible_moves):
self.clock = time.process_time()
self.time_for_current_move = self.time_remaining_in_round / self.turns_remaining_in_round - 0.05
if len(possible_moves) == 1:
return possible_moves[0]
current_depth = 1
prev_alpha = -INFINITY
# Choosing an arbitrary move:
best_move = possible_moves[0]
if (self.w < 1):
minimax = MiniMaxWithAlphaBetaPruningWithW(self.utility, self.color, self.no_more_time, self.w)
else: # self.w == 1
minimax = MiniMaxWithAlphaBetaPruning(self.utility, self.color, self.no_more_time)
time_last_move = 0;
# Iterative deepening until the time runs out.
while True:
print('{} going to depth: {}, remaining time: {}, prev_alpha: {}, best_move: {}'.format(
self.__repr__(), current_depth, self.time_for_current_move - (time.process_time() - self.clock),
prev_alpha, best_move))
time_before = time.process_time()
time_left = self.time_for_current_move - (time.process_time() - self.clock);
# if (time_last_move <= time_left):
try:
(alpha, move), run_time = run_with_limited_time(
minimax.search, (board_state, current_depth, -INFINITY, INFINITY, True), {},
time_left)
except (ExceededTimeError):
print('no more time')
break
except (MemoryError):
print('no more memory')
break
# else:
# print('{} has no enough time ({}) left to go deeper'.format(self.__repr__(), time_left))
# break;
time_after = time.process_time()
time_last_move = time_after - time_before
if self.no_more_time():
print('no more time')
break
prev_alpha = alpha
best_move = move
if alpha == INFINITY:
print('the move: {} will guarantee victory.'.format(best_move))
break
if alpha == -INFINITY:
print('all is lost')
break
current_depth += 1
if self.turns_remaining_in_round == 1:
self.turns_remaining_in_round = self.k
self.time_remaining_in_round = self.time_per_k_turns
else:
self.turns_remaining_in_round -= 1
self.time_remaining_in_round -= (time.process_time() - self.clock)
return best_move
and for convenience - run_with_limited_time:
def run_with_limited_time(func, args, kwargs, time_limit):
"""Runs a function with time limit
:param func: The function to run.
:param args: The functions args, given as tuple.
:param kwargs: The functions keywords, given as dict.
:param time_limit: The time limit in seconds (can be float).
:return: A tuple: The function's return value unchanged, and the running time for the function.
:raises PlayerExceededTimeError: If player exceeded its given time.
"""
q = Queue()
t = Thread(target=function_wrapper, args=(func, args, kwargs, q))
t.start()
# This is just for limiting the runtime of the other thread, so we stop eventually.
# It doesn't really measure the runtime.
t.join(time_limit)
if t.is_alive():
raise ExceededTimeError
q_get = q.get()
if isinstance(q_get, MemoryError):
raise q_get
return q_get
There is of course no mention to the object only to the functions of the games which run it. I don't know why it is happening. It must be very stupid but I have no idea… I had done only a simple copy of the code and I haven't changed this line…
Thanks in advance,
Eli
Your problem is here:
minimax = MiniMaxWithAlphaBetaPruningWithWDeepeningUntilRestfulness
(self.utility, self.color, self.no_more_time, self.w)
These are actually two separate lines, the second of which does nothing, while you intended it to be a single expression. It will assign minimax to be the class itself instead of an instance, which causes problems when calling a method on the class later.
You can put everything on one line, or just move the opening parenthesis to the first line (as Python allows expressions to continue on the next line when parentheses are left open):
minimax = MiniMaxWithAlphaBetaPruningWithWDeepeningUntilRestfulness(
self.utility, self.color, self.no_more_time, self.w)