Minimax explanation "for dummies"

Minimax explanation "for dummies" - python

I'm quite new to algorithms and i was trying to understand the minimax, i read a lot of articles,but i still can't get how to implement it into a tic-tac-toe game in python.
Can you try to explain it to me as easy as possible maybe with some pseudo-code or some python code?.
I just need to understand how it works. i read a lot of stuff about that and i understood the basic, but i still can't get how it can return a move.
If you can please don't link me tutorials and samples like (http://en.literateprograms.org/Tic_Tac_Toe_(Python)) , i know that they are good, but i simply need a idiot explanation.
thank you for your time :)

the idea of "minimax" is that there in a two-player game, one player is trying to maximize some form of score and another player is trying to minimize it. For example, in Tic-Tac-Toe the win of X might be scored as +1 and the win of O as -1. X would be the max player, trying to maximize the final score and O would be the min player, trying to minimize the final score.
X is called the max player because when it is X's move, X needs to choose a move that maximizes the outcome after that move. When O players, O needs to choose a move that minimizes the outcome after that move. These rules are applied recursively, so that e.g. if there are only three board positions open to play, the best play of X is the one that forces O to choose a minimum-value move whose value is as high as possible.
In other words, the game-theoretic minimax value V for a board position B is defined as
V(B) = 1 if X has won in this position
V(B) = -1 if O has won in this position
V(B) = 0 if neither player has won and no more moves are possible (draw)
otherwise
V(B) = max(V(B1), ..., V(Bn)) where board positions B1..Bn are
the positions available for X, and it is X's move
V(B) = min(V(B1), ..., V(Bn)) where board positions B1..Bn are
the positions available for O, and it is O's move
The optimal strategy for X is always to move from B to Bi such that V(Bi) is maximum, i.e. corresponds to the gametheoretic value V(B), and for O, analogously, to choose a minimum successor position.
However, this is not usually possible to calculate in games like chess, because in order to calculate the gametheoretic value one needs to enumerate the whole game tree until final positions and that tree is usually extremely large. Therefore, a standard approach is to coin an "evaluation function" that maps board positions to scores that are hopefully correlated with the gametheoretic values. E.g. in chess programs evaluation functions tend to give positive score for material advantage, open columns etc. A minimax algorithm them minimaximizes the evaluation function score instead of the actual (uncomputable) gametheoretic value of a board position.
A significant, standard optimization to minimax is "alpha-beta pruning". It gives the same results as minimax search but faster. Minimax can be also casted in terms of "negamax" where the sign of the score is reversed at every search level. It is just an alternative way to implement minimax but handles players in a uniform fashion. Other game tree search methods include iterative deepening, proof-number search, and more.

Minimax is a way of exploring the space of potential moves in a two player game with alternating turns. You are trying to win, and your opponent is trying to prevent you from winning.
A key intuition is that if it's currently your turn, a two-move sequence that guarantees you a win isn't useful, because your opponent will not cooperate with you. You try to make moves that maximize your chances of winning and your opponent makes moves that minimize your chances of winning.
For that reason, it's not very useful to explore branches from moves that you make that are bad for you, or moves your opponent makes that are good for you.

Related

set variable number of input nodes in python neat

i have this simple game where there is a ball bouncing on the screen and the player can move left and right of the screen and shoot an arrow up to pop the ball, every time the player hits a ball, the ball bursts and splits into two smaller balls until they reach a minimum size and disappear.
I am trying to solve this game with a genetic algorithm based on the python neat library and on this tutorial on flappy bird https://www.youtube.com/watch?v=MMxFDaIOHsE&list=PLzMcBGfZo4-lwGZWXz5Qgta_YNX3_vLS2, so I have a configuration file in which I have to specify how many input nodes must be in the network, I had thought to give as input the player's x coordinate, the distance between the player's x-coordinate and the ball's x-coordinate and the distance between the player's y-coordinate and the ball's y-coordinate.
My problem is that at the beginning of the game I have only one ball but after a few moves I could have more balls in the screen so I should have a greater number of input nodes,the more balls there are on the screen the more input coordinates I have to provide to the network.
So how to set the number of input nodes in a variable way?
config-feedforward.txt file
"""
# network parameters
num_hidden = 0
num_inputs = 3 #this needs to be variable
num_outputs = 3
"""
python file
for index,player in enumerate(game.players):
balls_array_x = []
balls_array_y = []
for ball in game.balls:
balls_array_x.append(ball.x)
balls_array_x.append(ball.y)
output = np.argmax(nets[index].activate(("there may be a number of variable arguments here")))
#other...
final code
for index,player in enumerate(game.players):
balls_array_x = []
balls_array_y = []
for ball in game.balls:
balls_array_x.append(ball.x)
balls_array_y.append(ball.y)
distance_list = []
player_x = player.x
player_y = player.y
i = 0
while i < len(balls_array_x):
dist = math.sqrt((balls_array_x[i] - player_x) ** 2 + (balls_array_y[i] - player_y) ** 2)
distance_list.append(dist)
i+=1
i = 0
if len(distance_list) > 0:
nearest_ball = min(distance_list)
output = np.argmax(nets[index].activate((player.x,player.y,nearest_ball)))

This is a good question and as far as I can tell from a quick Google search hasn't been addressed for simple ML algorithms like NEAT.
Traditionally resizing methods of Deep NN (padding, cropping, RNNs, middle-layers, etc) can obviously not be applied here since NEAT explicitly encodes each single neuron and connection.
I am also not aware of any general method/trick to make the input size mutable for the traditional NEAT algorithm and frankly don't think there is one. Though I can think of a couple of changes to the algorithm that would make this possible, but that's of no help to you I suppose.
In my opinion you therefore have 3 options:
You increase the input size to the maximum number of balls the algorithm should track and set the x-diff/y-diff value of non-existent balls to an otherwise impossible number (e.g. -1). If balls come into existence you actually set the values for those x-diff/y-diff input neurons and set them to -1 again when they are gone. Then you let NEAT figure it out. Also worth thinking about concatenating 2 separate NEAT NNs, with the first NN having 2 inputs, 1 output and the second NN having 1 (player pos) + x (max number of balls) inputs and 2 outputs (left, right). The first NN produces an output for each ball position (and is identical for each ball) and the second NN takes the first NNs output and turns it into an action. Also: The maximum number of balls doesn't have to be the maximum number of displayable balls, but can also be limited to 10 and only considering the 10 closest balls.
You only consider 1 ball for each action side (making your input 1 + 2*2). This could be the consideration of the lowest ball on each side or the closest ball on each side. Such preprocessing can make such simple NN tasks however quite easy to solve. Maybe you can add inertia into your test environment and thereby add a non-linearity that makes it not so straightforward to always teleport/hurry to the lowest ball.
You input the whole observation space into NEAT (or a uniformly downsampled fraction), e.g. the whole game at whatever resolution is lowest but still sensible. I know that this observation space is huge, but NEAT works quite well in handling such spaces.
I know that this is not the variable input size option of NEAT that you might have hoped for, but I don't know about any such general option/trick without changing the underlying NEAT algorithm significantly.
However, I am very happy to be corrected if someone knows a better option!

Genetic programming for simple games, feasible for non-experts? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have been seriously trying to create a genetic program that will evolve to play tic-tac-toe in an acceptable way. I'm using the genome to generate a function that will then take the board as input and output the result... But it's not working.
Can this program be written in less than 500 lines of code (including blank lines and documentation)? Perhaps my problem is that I'm generating AIs that are too simple.
My research
A Genetic Algorithm for Tic-Tac-Toe (very different from my approach).
http://www.tropicalcoder.com/GeneticAlgorithm.htm (too abstract).
In quite a good number of websites there are references to 'neural networks'. Are they really required?
Important disclaimers
This is NOT homework of any kind, just a personal project for the sake of learning something cool.
This is NOT a 'give me the codz plz', I am looking for high level suggestions. I explicitly don't want ready-made solutions as the answers.
Please give me some help and insight into this 'genetic programming' concept applied to this specific simple problem.
#OnABauer: I think that I am using genetic programming because quoting Wikipedia
In artificial intelligence, genetic programming (GP) is an
evolutionary algorithm-based methodology inspired by biological
evolution to find computer programs that perform a user-defined task.
And I am trying to generate a program (in this case function) that will perform the task of playing tic-tac-toe, you can see it because the output of the most important genetic_process function is a genome that will then be converted to a function, thus if I understood correctly this is genetic programming because the output is a function.
Program introspection and possible bugs/problems
The code runs with no errors nor crashes. The problem is that in the end what I get is an incompetent AI, that will attempt to make illegal move and be punished with losing each and every time. It is no better than random.
Possibly it is because my AI function is so simple: just making calculations on the stored values of the squares with no conditionals.
High level description
What is your chromosome meant to represent?
My cromosome rapresents a list of functions that will then be used to reduce over the array of the board stored as trinary. OK let me make an example:
* Cromosome is: amMMMdsa (length of cromosome must be 8).
1. The first step is converting this to functions following the lookup at the top called LETTERS_TO_FUNCTIONS, this give the functions: [op.add,op.mul,op.mod,op.mod,op.mod,op.floordiv,op.sub,op.add]
The second step is converting the board to a trinary rapresentation. So let's say the board is "OX XOX " we will get [2, 3, 1, 1, 3, 2, 3, 1, 1]
The third step is reducing the trinary raprentation using the functions obtained above. That is best explained by the function down below:
def reduce_by_all_functions(numbers_list,functions_list):
"""
Applies all the functions in a list to a list of numbers.
>>> reduce_by_all_functions([3,4,6],[op.add,op.sub])
1
>>> reduce_by_all_functions([6,2,4],[op.mul,op.floordiv])
3
"""
if len(functions_list) != len(numbers_list) - 1:
raise ValueError("The functions must be exactly one less than the numbers")
result = numbers_list[0]
for index,n in enumerate(numbers_list[1:]):
result = functions_list[index](result,n)
return result
Thus yielding the result of: 0 that means that the ai decided to go in the first square
What is your fitness function?
Luckly this is easy to answer.
def ai_fitness(genome,accuracy):
"""
Returns how good an ai is by letting it play against a random ai many times.
The higher the value, the best the ai
"""
ai = from_genome_to_ai(genome)
return decide_best_ai(ai,random_ai,accuracy)
How does your mutation work?
The son ereditates 80% of the genes from the father and 20% of genes from the mother. There is no kind of random mutation besides that.
And how is that reduce_by_all_functions() being used? I see that it
takes a board and a chromosome and returns a number. How is that
number used, what is it meant to represent, and... why is it being
returned modulo 9?
reduce_by_all_functions() is used to actually apply the functions previously obtained by the cromosome. The number is the square the ai wants to take. It is modulo 9 because it must be between 0 and 8 because the board is 9 spaces.
My code so far:
import doctest
import random
import operator as op
SPACE = ' '
MARK_OF_PLAYER_1 = "X"
MARK_OF_PLAYER_2 = "O"
EMPTY_MARK = SPACE
BOARD_NUMBERS = """
The moves are numbered as follows:
0 | 1 | 2
---------
3 | 4 | 5
---------
6 | 7 | 8
"""
WINNING_TRIPLETS = [ (0,1,2), (3,4,5), (6,7,8),
(0,3,6), (1,4,7), (2,5,8),
(0,4,8), (2,4,6) ]
LETTERS_TO_FUNCTIONS = {
'a': op.add,
'm': op.mul,
'M': op.mod,
's': op.sub,
'd': op.floordiv
}
def encode_board_as_trinary(board):
"""
Given a board, replaces the symbols with the numbers
1,2,3 in order to make further processing easier.
>>> encode_board_as_trinary("OX XOX ")
[2, 3, 1, 1, 3, 2, 3, 1, 1]
>>> encode_board_as_trinary(" OOOXXX")
[1, 1, 1, 2, 2, 2, 3, 3, 3]
"""
board = ''.join(board)
board = board.replace(MARK_OF_PLAYER_1,'3')
board = board.replace(MARK_OF_PLAYER_2,'2')
board = board.replace(EMPTY_MARK,'1')
return list((int(square) for square in board))
def create_random_genome(length):
"""
Creates a random genome (that is a sequences of genes, from which
the ai will be generated. It consists of randoom letters taken
from the keys of LETTERS_TO_FUNCTIONS
>>> random.seed("EXAMPLE")
# Test is not possible because even with the same
# seed it gives different results each run...
"""
letters = [letter for letter in LETTERS_TO_FUNCTIONS]
return [random.choice(letters) for _ in range(length)]
def reduce_by_all_functions(numbers_list,functions_list):
"""
Applies all the functions in a list to a list of numbers.
>>> reduce_by_all_functions([3,4,6],[op.add,op.sub])
1
>>> reduce_by_all_functions([6,2,4],[op.mul,op.floordiv])
3
"""
if len(functions_list) != len(numbers_list) - 1:
raise ValueError("The functions must be exactly one less than the numbers")
result = numbers_list[0]
for index,n in enumerate(numbers_list[1:]):
result = functions_list[index](result,n)
return result
def from_genome_to_ai(genome):
"""
Creates an AI following the rules written in the genome (the same as DNA does).
Each letter corresponds to a function as written in LETTERS_TO_FUNCTIONS.
The resulting ai will reduce the board using the functions obtained.
>>> ai = from_genome_to_ai("amMaMMss")
>>> ai("XOX OXO")
4
"""
functions = [LETTERS_TO_FUNCTIONS[gene] for gene in genome]
def ai(board):
return reduce_by_all_functions(encode_board_as_trinary(board),functions) % 9
return ai
def take_first_empty_ai(board):
"""
Very simple example ai for tic-tac-toe
that takes the first empty square.
>>> take_first_empty_ai(' OX O XXO')
0
>>> take_first_empty_ai('XOX O XXO')
3
"""
return board.index(SPACE)
def get_legal_moves(board):
"""
Given a tic-tac-toe board returns the indexes of all
the squares in which it is possible to play, i.e.
the empty squares.
>>> list(get_legal_moves('XOX O XXO'))
[3, 5]
>>> list(get_legal_moves('X O XXO'))
[1, 2, 3, 5]
"""
for index,square in enumerate(board):
if square == SPACE:
yield index
def random_ai(board):
"""
The simplest possible tic-tac-toe 'ai', just
randomly choses a random move.
>>> random.seed("EXAMPLE")
>>> random_ai('X O XXO')
3
"""
legal_moves = list(get_legal_moves(board))
return random.choice(legal_moves)
def printable_board(board):
"""
User Interface function:
returns an easy to understand representation
of the board.
"""
return """
{} | {} | {}
---------
{} | {} | {}
---------
{} | {} | {}""".format(*board)
def human_interface(board):
"""
Allows the user to play tic-tac-toe.
Shows him the board, the board numbers and then asks
him to select a move.
"""
print("The board is:")
print(printable_board(board))
print(BOARD_NUMBERS)
return(int(input("Your move is: ")))
def end_result(board):
"""
Evaluates a board returning:
0.5 if it is a tie
1 if MARK_OF_PLAYER_1 won # default to 'X'
0 if MARK_OF_PLAYER_2 won # default to 'O'
else if nothing of the above applies return None
>>> end_result('XXX OXO')
1
>>> end_result(' O X X O')
None
>>> end_result('OXOXXOXOO')
0.5
"""
if SPACE not in board:
return 0.5
for triplet in WINNING_TRIPLETS:
if all(board[square] == 'X' for square in triplet):
return 1
elif all(board[square] == 'O' for square in triplet):
return 0
def game_ended(board):
"""
Small syntactic sugar function to if the game is ended
i.e. no tie nor win occured
"""
return end_result(board) is not None
def play_ai_tic_tac_toe(ai_1,ai_2):
"""
Plays a game between two different ai-s, returning the result.
It should be noted that this function can be used also to let the user
play against an ai, just call it like: play_ai_tic_tac_toe(random_ai,human_interface)
>>> play_ai_tic_tac_toe(take_first_empty_ai,take_first_empty_ai)
1
"""
board = [SPACE for _ in range(9)]
PLAYER_1_WIN = 1
PLAYER_1_LOSS = 0
while True:
for ai,check in ( (ai_1,MARK_OF_PLAYER_1), (ai_2,MARK_OF_PLAYER_2) ):
move = ai(board)
# If move is invalid you lose
if board[move] != EMPTY_MARK:
if check == MARK_OF_PLAYER_1:
return PLAYER_1_LOSS
else:
return PLAYER_1_WIN
board[move] = check
if game_ended(board):
return end_result(board)
def loop_play_ai_tic_tac_toe(ai_1,ai_2,games_number):
"""
Plays games number games between ai_1 and ai_2
"""
return sum(( play_ai_tic_tac_toe(ai_1,ai_2)) for _ in range(games_number))
def decide_best_ai(ai_1,ai_2,accuracy):
"""
Returns the number of times the first ai is better than the second:
ex. if the ouput is 1.4, the first ai is 1.4 times better than the second.
>>> decide_best_ai(take_first_empty_ai,random_ai,100) > 0.80
True
"""
return sum((loop_play_ai_tic_tac_toe(ai_1,ai_2,accuracy//2),
loop_play_ai_tic_tac_toe(ai_2,ai_1,accuracy//2))) / (accuracy // 2)
def ai_fitness(genome,accuracy):
"""
Returns how good an ai is by lettting it play against a random ai many times.
The higher the value, the best the ai
"""
ai = from_genome_to_ai(genome)
return decide_best_ai(ai,random_ai,accuracy)
def sort_by_fitness(genomes,accuracy):
"""
Syntactic sugar for sorting a list of genomes based on the fitness.
High accuracy will yield a more accurate ordering but at the cost of more
computation time.
"""
def fit(genome):
return ai_fitness(genome,accuracy)
return list(sorted(genomes, key=fit, reverse=True))
# probable bug-fix because high fitness means better individual
def make_child(a,b):
"""
Returns a mix of cromosome a and cromosome b.
There is a bias towards cromosome a because I think that
a too weird soon is going to be bad.
"""
result = []
for index,char_a in enumerate(a):
char_b = b[index]
if random.random() > 0.8:
result.append(char_a)
else:
result.append(char_b)
return result
def genetic_process(population_size,generation_number,accuracy,elite_number):
"""
A full genetic process yielding a good tic-tac-toe ai. # not yet
# Parameters:
# population_size: the number of ai-s that you allow to be alive
at once
# generation_number: the number of generations of the gentetic
# accuracy: how well the ai-s are ordered,
low accuracy means that a good ai may be considered bad or
viceversa. High accuarcy is computationally costly
# elite_number: the number of best programmes that get to reproduce
at each generation.
# Return:
# A genome for a tic-tac-toe ai
"""
pool = [create_random_genome(9-1) for _ in range(population_size)]
for generation in range(generation_number):
best_individuals = sort_by_fitness(pool,accuracy)[:elite_number]
the_best = best_individuals[0]
for good_individual in best_individuals:
pool.append(make_child(the_best,good_individual))
pool = sort_by_fitness(pool,accuracy)[:population_size]
return the_best
def _test():
"""
Tests all the script by running the
>>> 2 + 2 # code formatted like this
4
"""
doctest.testmod()
def main():
"""
A simple demo to let the user play against a genetic opponent.
"""
print("The genetic ai is being created... please wait.")
genetic_ai = from_genome_to_ai(genetic_process(50,4,40,25))
play_ai_tic_tac_toe(genetic_ai,human_interface)
if __name__ == '__main__':
main()

First and foremost, I am obligated to say that Tic Tac Toe is really too simple a problem to reasonably attack with a genetic program. You simply don't need the power of a GP to win Tic Tac Toe; you can solve it with a brute force lookup table, or a simple game tree.
That said, if I understand correctly, your basic notion is this:
1) Create chromosomes of length 8, where each gene is an arithmetic operation, and the 8-gene chromosome acts on each board as a board evaluation function. That is, a chromosome takes in a board representation, and spits out a number representing the goodness of that board.
It's not perfectly clear that this is what you're doing, because your board representations are each 9 integers (1, 2, 3 only) but your examples are given in terms of the "winning triples" which are 2 integers (0 through 8).
2) Start the AI up and, on the AI's turn it should get a list of all legal moves, evaluate the board per its chromosome for each legal move and... take that number, modulo 9, and use that as the next move? Surely there's some code in there to handle the case where that move is illegal....
3) Let a bunch of these chromosome representations either play a standard implementation, or play each other, and determine the fitness based on the number of wins.
4) Once a whole generation of chromosomes has been evaluated, create a new generation. It's not clear to me how you are selecting the parents from the pool, but once the parents are selected, a child is produced by just taking individual genes from the parents by an 80-20 rule.
Your overall high level strategy is sound, but there are a lot of conceptual and implementation flaws in the execution. First, let's talk about fully observable games and simple ways to make AIs for them. If the game is very simple (such as Tic Tac Toe) you can simply make a brute force minimax game tree, such as this. TTT is simple enough that even your cell phone can go all the way to the bottom of the tree very quickly. You can even solve it by brute force with a look up table: Just make a list of all board positions and the response to each one.
When the games get larger-- think checkers, chess, go-- that is no longer true, and one of the ways around this is to develop what's called a board evaluation function. It is a function which takes a board position and returns a number, usually with higher being better for one player and lower being better for the other. One then executes a search to certain acceptable depth and aims for the highest (say) board evaluation function.
This begs the question: How do we come up with the board evaluation function? Originally, one asked experts at the game to develop these function for you. There is a great paper by Chellapilla and Fogel which is similar to what you want to do for checkers-- they use neural networks to determine the board evaluation functions, and, critically, these neural networks are encoded as genomes and evolved. They are then used in search depth 4 trees. The end results are very competitive against human players.
You should read that paper.
What you are trying to do, I think, is very similar, except instead of coding a neural network as a chromosome, you're trying to code up a very restricted algebraic expression, always of the form:
((((((((arg1 op1 arg2) op2 arg3) op3 arg4) op4 arg5) op5 arg6) op6 arg7) op7 arg8) op8 arg)
... and then you're using it mod 9 to pick a move.
Now let's talk about genetic algorithms, genetic programs, and the creation of new children. The whole idea in evolutionary techniques is to combine the best attributes of two hopefully-good solutions in the hopes that they will be even better, without getting stuck in a local maximum.
Generally, this is done by touranment selection, crossover, and mutation. Tournament selection means selecting the parents proportionally to their fitness. Crossover means dividing the chromosomes into two usually contiguous regions and taking one region from one parent and the other region from the other parent. (Why contiguous? Because Holland's Schema Theorem) Mutation means occasionally changing a gene, as a means of maintaining population diversity.
Now let's look at what you're doing:
1) Your board evaluation function-- the function that your chromosome turns into, which acts on the board positions-- is highly restricted and very arbitrary. There's not much rhyme or reason to assigning 1, 2, and 3 as those numbers, but that might possibly be okay. The bigger flaw is that your functions are a terribly restricted part of the overall space of functions. They are always the same length, and the parse tree always looks the same.
There's no reason to expect anything useful to be in this restrictive space. What's needed is to come up with a scheme which allows for a much more general set of parse trees, including crossover and mutation schemes. You should look up some papers or books by John Koza for ideas on this topic.
Note that Chellapilla and Fogel have fixed forms of functions as well, but their chromosomes are substantially larger than their board representations. A checkers board has 32 playable spaces, and each space can have 5 states. But their neural network had some 85 nodes, and the chromosome comprised the connection weights of those nodes-- hundreds, if not thousands, of values.
2) Then there's this whole modulo 9 thing. I don't understand why you're doing that. Don't do that. You're just scrambling whatever information might be in your chromosomes.
3) Your function to make new children is bad. Even as a genetic algorithm, you should be dividing the chromosomes in two (at random points) and taking part of one parent from one side, and the other part from the other parent on the other side. For genetic programming, which is what you're doing, there are analogous strategies for doing crossovers on parse trees. See Koza.
You must include mutation, or you will almost certainly get sub-optimal results.
4a) If you evaluate the fitness by playing against a competent AI, then realize that your chromosomes will never, ever win. They will lose, or they will draw. A competent AI will never lose. Moreover, it is likely that your AIs will lose all the time and initial generations may all come out as equally (catastrophically) poor players. It's not impossible to get yourelf out of that hole, but it will be hard.
4b) On the other hand, if, like Chellapilla and Fogel, you play the AIs against them selves, then you'd better make certain that the AIs can play either X or O. Otherwise you're never going to make any progress at all.
5) Finally, even if all these concerns are addressed, I'm not convinced this will get great results. Note that the checkers example searches to a depth of 4, which is not a big horizon in a game of checkers that might last 20 or 30 moves.
TTT can only ever last 9 moves.
If you don't do a search tree and just go for the highest board evaluation function, you might get something that works. You might not. I'm not sure. If you search to depth 4, you might as well skip to a full search to level 9 and do this conventionally.

connect 4 minimax algorithm: one for loop

I'm trying to write the minimax algorithm in python with one for loop (yes I know wikipedia says the min and max players are often treated separately), and I'm using the variable turn to keep track of whether the min or max player is currently exploring options. I think, however, that at present the code wrongly evaluates for X when it is the O player's turn and O when it is the X player's turn.
Here's the source (p12) : http://web.cs.wpi.edu/~rich/courses/imgd4000-d10/lectures/E-MiniMax.pdf
Things you might be wondering about:
b is a list of lists; 0 denotes an available space
evaluate is used both for checking for a victory (by default) as well as for scoring the board for a particular player (we look for places where the value of a cell on the board ).
makeMove returns the row of the column the piece is placed in (used for subsequent removal)
Any help would be very much appreciated. Please let me know if anything is unclear.
def minMax(b, turn, depth=0):
player, piece = None, None
best, move = None, -1
if turn % 2 == 0 : # even player is max player
player, piece = 'max', 'X'
best, move = -1000, -1
else :
player, piece = 'min', 'O'
best, move = 1000, -1
if boardFull(b) or depth == MAX_DEPTH:
return evaluate(b, False, piece)
for col in range(N_COLS):
if possibleMove(b, col) :
row = makeMove(b, col, piece)
turn += 1 # now the other player's turn
score = minMax(b, turn, depth+1)
if player == 'max':
if score > best:
best, move = score, col
else:
if score < best:
best, move = score, col
reset(b, row, col)
return move
#seaotternerd. Yes I was wondering about that. But I'm not sure that is the problem. Here is one printout. As you can see, X has been dropped in the fourth column by AI but is evaluating from the min player's perspective (it counts 2 O units in the far right column).
Here's what the evaluate function determines, depending on piece:
if piece == 'O':
return best * -25
return best * 25

You are incrementing turn every time that you find a possible move and not undoing it. As a result, when control returns to a given minMax call, turn is 1 greater than it was before. Then, the next time your program finds a possible move, it increments turn again. This will cause the next call to minMax to select the wrong player as the current one. Overall, I believe this will result in the board getting evaluated for the wrong player approximately half the time. You can fix this by adding 1 to turn in the recursive call to minMax(), rather than by changing the value stored in the variables:
row = makeMove(b, col, piece)
score = minMax(b, turn+1, depth+1)
EDIT: Digging deeper into your code, I'm finding a number of additional problems:
MAX_DEPTH is set to 1. This will not allow the ai to see its own next move, instead forcing it to make decisions solely based on getting in the way of the other player.
minMax() returns the score if it has reached MAX_DEPTH or a win condition, but otherwise it returns a move. This breaks propagation of the score back up the recursion tree.
This is not critical, but it's something to keep in mind: your board evaluation function only takes into account how long a given player's longest string is, ignoring how the other player is doing and any other factors that may make one placement better than another. This mostly just means that your AI won't be very "smart."
EDIT 2: A big part of the problem with the way that you're keeping track of min and max is in your evaluation function. You check to see if each piece has won. You are then basing the score of that board off of who the current player is, but the point of having a min player and a max player is that you don't need to know who the current player is to evaluate the board. If max has won, the score is infinity. If min has won, the score is -infinity.
def evaluate(b, piece):
if evaluate_aux(b, True, 'X'):
return 100000
if evaluate_aux(b, True, 'O'):
return -100000
return evaluate_aux(b, False, piece)
In general, I think there is a lot that you could do to make your code cleaner and easier to read, which would make it a lot easier to detect errors. For instance, if you are saying that "X" is always max and "Y" is always min, then you don't need to bother keeping track of both player and piece. Additionally, having evaluate_aux sometimes return a boolean and sometimes return an int is confusing. You could just have it count the number of each piece in a row, for instance, with contiguous "X"s counting positive and contiguous "O"s counting negative and sum the scores; an evaluation function isn't supposed to be from one player's perspective or the other. Obviously you would still need to have a check for win conditions in there. This would also address point 3.
It's possible that there are more problems, but like I said, this code is not particularly easy to wade through. If you fix the things that I've already found and clean it up, I can take another look.

How to generate statistically probably locations for ships in battleship

I made the original battleship and now I'm looking to upgrade my AI from random guessing to guessing statistically probably locations. I'm having trouble finding algorithms online, so my question is what kinds of algorithms already exist for this application? And how would I implement one?
Ships: 5, 4, 3, 3, 2
Field: 10X10
Board:
OCEAN = "O"
FIRE = "X"
HIT = "*"
SIZE = 10
SEA = [] # Blank Board
for x in range(SIZE):
SEA.append([OCEAN] * SIZE)
If you'd like to see the rest of the code, I posted it here: (https://github.com/Dbz/Battleship/blob/master/BattleShip.py); I didn't want to clutter the question with a lot of irrelevant code.

The ultimate naive solution wold be to go through every possible placement of ships (legal given what information is known) and counting the number of times each square is full.
obviously, in a relatively empty board this will not work as there are too many permutations, but a good start might be:
for each square on board: go through all ships and count in how many different ways it fits in that square, i.e. for each square of the ships length check if it fits horizontally and vertically.
an improvement might be to also check for each possible ship placement if the rest of the ships can be placed legally whilst covering all known 'hits' (places known to contain a ship).
to improve performance, if only one ship can be placed in a given spot, you no longer need to test it on other spots. also, when there are many 'hits', it might be quicker to first cover all known 'hits' and for each possible cover go through the rest.
edit: you might want to look into DFS.
Edit: Elaboration on OP's (#Dbz) suggestion in the comments:
hold a set of dismissed placements ('dissmissed') of ships (can be represented as string, say "4V5x3" for the placement of length 4 ship in 5x3, 5x4, 5x5, 5x6), after a guess you add all the placements the guess dismisses, then for each square hold a set of placements that intersect with it ('placements[x,y]') then the probability would be:
34-|intersection(placements[x,y], dissmissed)|/(3400-|dismissed|)
To add to the dismissed list:
if guess at (X,Y) is a miss add placements[x,y]
if guess at (X,Y) is a hit:
add neighboring placements (assuming that ships cannot be placed adjacently), i.e. add:
<(2,3a,3b,4,5)>H<X+1>x<Y>, <(2,3a,3b,4,5)>V<X>x<Y+1>
<(2,3a,3b,4,5)>H<X-(2,3,3,4,5)>x<Y>, <(2,3a,3b,4,5)>V<X>x<Y-(2,3,3,4,5)>
2H<X+-1>x<Y+(-2 to 1)>, 3aH<X+-1>x<Y+(-3 to 1)> ...
2V<X+(-2 to 1)>x<Y+-1>, 3aV<X+(-3 to 1)>x<Y+-1> ...
if |intersection(placements[x,y], dissmissed)|==33, i.e. only one placement possible add ship (see later)
check if any of the previews hits has only one possible placement left, if so, add the ship
check to see if any of the ships have only possible placement, if so, add the ship
adding a ship:
add all other placements of that ship to dismissed
for each (x,y) of the ships placement add placements[x,y] with out the actual placement
for each (x,y) of the ships placement mark as hit guess (if not already known) run stage 2
for each (x,y) neighboring the ships placement mark as miss guess (if not already known) run stage 1
run stage 3 and 4.
i might have over complicated this, there might be some redundant actions, but you get the point.

Nice question, and I like your idea for statistical approach.
I think I would have tried a machine learning approach for this problem as follows:
First model your problem as a classification problem.
The classification problem is: Given a square (x,y) - you want to tell the likelihood of having a ship in this square. Let this likelihood be p.
Next, you need to develop some 'features'. You can take the surrounding of (x,y) [as you might have partial knowledge on it] as your features.
For example, the features of the middle of the following mini-board (+ indicates the square you want to determine if there is a ship or not in):
OO*
O+*
?O?
can be something like:
f1 = (0,0) = false
f2 = (0,1) = false
f3 = (0,2) = true
f4 = (1,0) = false
**note skipping (1,1)
f5 = (1,2) = true
f6 = (2,0) = unknown
f7 = (2,1) = false
f8 = (2,2) = unknown
I'd implement features relative to the point of origin (in this case - (1,1)) and not as absolute location on board (so the square up to (3,3) will also be f2).
Now, create a training set. The training set is a 'labeled' set of features - based on some real boards. You can create it manually (create a lot of boards), automatically by a random generator of placements, or by some other data you can gather.
Feed the training set to a learning algorithm. The algorithm should be able to handle 'unknowns' and be able to give probability of "true" and not only a boolean answer. I think a variation of Naive Bayes can fit well here.
After you have got a classifier - exploit it with your AI.
When it's your turn, choose to fire upon a square which has the maximal value of p. At first, the shots will be kinda random - but with more shots you fire, you will have more information on the board, and the AI will exploit it for better predictions.
Note that I gave features based on a square of size 1. You can of course choose any k and find features on this bigger square - it will give you more features, but each might be less informative. There is no rule of thumb which will be better - and it should be tested.

Main question is, how are you going to find statistically probable locations. Are they already known or you want to figure them out?
Either case, I'd just make the grid weighed. In your case, the initial weight for each slot would be 1.0/(SIZE^2). The sum of weights must be equal to 1.
You can then adjust weights based on the statistics gathered from N last played games.
Now, when your AI makes a choice, it chooses a coordinate to hit based on weighed probabilities. The quick and simple way to do that would be:
Generate a random number R in range [0..1]
Start from slot (0, 0) adding the weights, i.e. S = W(0, 0) + W(0, 1) + .... where W(n, m) is the weight of the corresponding slot. Once S >= R, you've got the coordinate to hit.
This can be optimised by pre-calculating cumulative weights for each row, have fun :)

Find out which ships are still alive:
alive = [2,2,3,4] # length of alive ships
Find out spots where you have not shot, for example with a numpy.where()
Loop over spots where you can shoot.
Check the sides of the given position. Go left and right, how many spaces? Go up and down, how many spaces? If you can fit a boat in that many spaces, you can fit any smaller boat, so this loop I'd do it from the largest ship downwards, and I'd add to the counts in this position as many +1 as ships smaller than the one that fits.
Once you have done all of this, the position with more points should be the most probable to attack and hit something.
Of course, it can get as complicated as you want. You can also ask yourself, instead of which is my next hit, which combinations of hits will give me the victory in less number of hits or any other combination/parametrization of the problem. Good luck!

Following a Dynamic Score

I have little to no formal discrete math training, and have run into a wee bit of an issue. I am trying to write an agent which reads in a human player's (arbitrary) score and scores a point every so often. The agent needs to "lag behind" and "catch up" every so often, so that the human player believes there is some competition going on. Then, the agent must either win or lose (depending on the condition) against the human.
I have tried a few different techniques, including a wonky probabilistic loop (which failed horribly). I was thinking that this problem calls for something like an emission Hidden Markov Model (HMM), but I'm not sure how to implement it (or even whether this is the best approach).
I have a gist up, but again, it sucks.
I hope the __main__ function provides some insight as to the goal of this agent. It is going to be called in pygame.

I think you may be over-thinking this. You can use simple probability to estimate how often and by how much the computer's score should "catch-up". Additionally, you can calculate the difference between the computer's score and human's score, and then feed this to a sigmoid-like function to give you the degree at which the computer's score increases.
Illustrative Python:
#!/usr/bin/python
import random, math
human_score = 0
computer_score = 0
trials = 100
computer_ahead_factor = 5 # maximum amount of points the computer can be ahead by
computer_catchup_prob = 0.33 # probability of computer catching up
computer_ahead_prob = 0.5 # probability of computer being ahead of human
computer_advantage_count = 0
for i in xrange(trials):
# Simulate player score increase.
human_score += random.randint(0,5) # add an arbitrary random amount
# Simulate computer lagging behind human, by calculating the probability of
# computer jumping ahead based on proximity to the human's score.
score_diff = human_score - computer_score
p = (math.atan(score_diff)/(math.pi/2.) + 1)/2.
if random.random() < computer_ahead_prob:
computer_score = human_score + random.randint(0,computer_ahead_factor)
elif random.random() < computer_catchup_prob:
computer_score += int(abs(score_diff)*p)
# Display scores.
print 'Human score:',human_score
print 'Computer score:',computer_score
computer_advantage_count += computer_score > human_score
print 'Effective computer advantage ratio: %.6f' % (computer_advantage_count/float(trials),)

I am making the assumption that the human cannot see the computer agent playing the game.  If this is the case, here is one idea you might try.
Create a list of all the possible point combinations that can be scored for any given move.  For each move, find a score range which you would like the agent to end up within after the current turn.  Reduce the set of possible move values to only the values which would end the agent in that particular range and randomly select one.  As conditions change for how far behind or ahead you would like the agent to get, simply slide your range appropriately.
If you are looking for something with some kind of built in and researched psychological effects for the human, I cant help you with that.  You will need to define more rules for us if you want something more specific to your situation than this.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.