Python: Reinforcement learning Tic-Tac-Toe AI working?

Python: Reinforcement learning Tic-Tac-Toe AI working? - python

So I'm scratching my head at this, and I don't know what's wrong. The code is here. The idea is that the AI plays games against itself, and is rewarded for winning or drawing, and punished for losing. Given a board, the AI will choose a move with certain weights. If the game ends up being a win or draw, those moves that it selected will have their weights increased. If the game ends up being a loss, those moves that it selected will have their weights decreased.
Instead what I observe is that 1) The 'X' player (Player 1) will almost always go for either the top left or bottom right square, rather than in the middle as expected, and 2) The 'X' player will become more and more favoured to win as the number of games increases.
I have no idea what is causing this behaviour, and I would appreciate any help.
Apparently stackoverflow requires you to also put code in to use pastebin, so here is the reward bit, although it probably makes more sense in the full context linked above.
foo = ai_player
for i in range(0,len(moves_made)):
# Find the index of the move made
weight_index = foo.children.index(moves_made[i])
# If X won
if checkWin(current_player.placements) == 1:
if i % 2 == 0:
foo.weights[weight_index] += 10
else:
foo.weights[weight_index] -= 10
if foo.weights[weight_index] <= 0: foo.weights[weight_index] = 0
# If O won
if checkWin(current_player.placements) == -1:
if i % 2 == 0:
foo.weights[weight_index] -= 10
if foo.weights[weight_index] <= 0: foo.weights[weight_index] = 0
else:
foo.weights[weight_index] += 10
# If it was a draw
if checkWin(current_player.placements) == 0:
if i % 2 == 0:
foo.weights[weight_index] += 5
else:
foo.weights[weight_index] += 5
foo = foo.children[weight_index]

There's a number of issues with your original code for trying to get agents to learn.
You use absolute weights instead of some calculated heuristic for the weights. When adding weights, you want to use an update mechanism (e.g. Q-Learning).
Wikipedia Q-Learning
This bounds the board weights between the range of possible rewards and allows new, promising options to rise, given positive rewards (having a score +100 or +200 will be very hard to beat).
You assign weightings to the board indexes and not board positions. This will teach agents that certain squares of the board are 'the best' and other squares aren't regardless of the utility. (i.e. if there are X's on square 0 and square 1, and O thinks their 'absolute' best move is bottom right, however X can win next turn).
Moves seem to be random in this environment (as far as I can see). The agents need some way of 'exploring' to find which moves work and which moves don't, and then 'exploiting' the best moves found so far.
You only store the 'state' values and not the value of taking actions in a given state. With this, agents can calculate the value of the state they're in, but they need a way of calculating the value of actions they can take, to inform them of the best actions to take.
In the Reinforcement Learning framework, the agent takes in the state and needs to evaluate the best possible action to take, given the state.
I've re-written your code, using a Q-learning table to get the agents moving appropriately (moving the agents into a class). Also using NumPy for argmax etc functions.
Q-Learning function is:
self.Q_table[(state, action)] = q_val + 0.1 * \
(reward + self.return_value[(state, action)] / self.return_number[(state, action)])
Code is in a pastebin.
With Epsilon of 0.08 and 1,000,000 episodes, the win rate for X was ~16%, Y was ~7% and about ~77% draws.
With Epsilon of 0.04 and 1,000,000 episodes, the win rate for X was ~9%, Y was ~3% and about ~88% draws.
With Epsilon of 0.04 and 2,000,000 episodes - almost identical results to 1m episodes.

It's hard to tell just like this but possibly you reward them for playing draws and it keeps happening?
Also, you should evaluate checkWin() once at the start and you could make a simple function like update(weight) and pass 10, -10 or 5 depending on your case
Is your algorithm:
0. init all weights to 100 for every board state
reset board
get board state
make a random (weighted) move depending on weights/board state
check if the game ends, if so update weights and go back to 1, else go to 2
?

Related

set variable number of input nodes in python neat

i have this simple game where there is a ball bouncing on the screen and the player can move left and right of the screen and shoot an arrow up to pop the ball, every time the player hits a ball, the ball bursts and splits into two smaller balls until they reach a minimum size and disappear.
I am trying to solve this game with a genetic algorithm based on the python neat library and on this tutorial on flappy bird https://www.youtube.com/watch?v=MMxFDaIOHsE&list=PLzMcBGfZo4-lwGZWXz5Qgta_YNX3_vLS2, so I have a configuration file in which I have to specify how many input nodes must be in the network, I had thought to give as input the player's x coordinate, the distance between the player's x-coordinate and the ball's x-coordinate and the distance between the player's y-coordinate and the ball's y-coordinate.
My problem is that at the beginning of the game I have only one ball but after a few moves I could have more balls in the screen so I should have a greater number of input nodes,the more balls there are on the screen the more input coordinates I have to provide to the network.
So how to set the number of input nodes in a variable way?
config-feedforward.txt file
"""
# network parameters
num_hidden = 0
num_inputs = 3 #this needs to be variable
num_outputs = 3
"""
python file
for index,player in enumerate(game.players):
balls_array_x = []
balls_array_y = []
for ball in game.balls:
balls_array_x.append(ball.x)
balls_array_x.append(ball.y)
output = np.argmax(nets[index].activate(("there may be a number of variable arguments here")))
#other...
final code
for index,player in enumerate(game.players):
balls_array_x = []
balls_array_y = []
for ball in game.balls:
balls_array_x.append(ball.x)
balls_array_y.append(ball.y)
distance_list = []
player_x = player.x
player_y = player.y
i = 0
while i < len(balls_array_x):
dist = math.sqrt((balls_array_x[i] - player_x) ** 2 + (balls_array_y[i] - player_y) ** 2)
distance_list.append(dist)
i+=1
i = 0
if len(distance_list) > 0:
nearest_ball = min(distance_list)
output = np.argmax(nets[index].activate((player.x,player.y,nearest_ball)))

This is a good question and as far as I can tell from a quick Google search hasn't been addressed for simple ML algorithms like NEAT.
Traditionally resizing methods of Deep NN (padding, cropping, RNNs, middle-layers, etc) can obviously not be applied here since NEAT explicitly encodes each single neuron and connection.
I am also not aware of any general method/trick to make the input size mutable for the traditional NEAT algorithm and frankly don't think there is one. Though I can think of a couple of changes to the algorithm that would make this possible, but that's of no help to you I suppose.
In my opinion you therefore have 3 options:
You increase the input size to the maximum number of balls the algorithm should track and set the x-diff/y-diff value of non-existent balls to an otherwise impossible number (e.g. -1). If balls come into existence you actually set the values for those x-diff/y-diff input neurons and set them to -1 again when they are gone. Then you let NEAT figure it out. Also worth thinking about concatenating 2 separate NEAT NNs, with the first NN having 2 inputs, 1 output and the second NN having 1 (player pos) + x (max number of balls) inputs and 2 outputs (left, right). The first NN produces an output for each ball position (and is identical for each ball) and the second NN takes the first NNs output and turns it into an action. Also: The maximum number of balls doesn't have to be the maximum number of displayable balls, but can also be limited to 10 and only considering the 10 closest balls.
You only consider 1 ball for each action side (making your input 1 + 2*2). This could be the consideration of the lowest ball on each side or the closest ball on each side. Such preprocessing can make such simple NN tasks however quite easy to solve. Maybe you can add inertia into your test environment and thereby add a non-linearity that makes it not so straightforward to always teleport/hurry to the lowest ball.
You input the whole observation space into NEAT (or a uniformly downsampled fraction), e.g. the whole game at whatever resolution is lowest but still sensible. I know that this observation space is huge, but NEAT works quite well in handling such spaces.
I know that this is not the variable input size option of NEAT that you might have hoped for, but I don't know about any such general option/trick without changing the underlying NEAT algorithm significantly.
However, I am very happy to be corrected if someone knows a better option!

Optimize permutations search loop (can't use itertools) that is extremely slow. Any suggestions?

This is a game where you have 12 cards and you pick you until you choose 3 from the same group. I am attempting to find the probability of choosing each group. The script that I have created works, but it is extremely slow. My coworker created a similar script in R without the functions and his script takes 1/100th the time that mine takes. I am just trying to figure out why. Any ideas would be greatly appreciated.
from collections import Counter
import pandas as pd
from datetime import datetime
weight = pd.read_excel('V01Weights.xlsx')
Weight looks like the following:
Symb Weight
Grand 170000
Grand 170000
Grand 105
Major 170000
Major 170000
Major 215
Minor 150000
Minor 150000
Minor 12000
Bonus 105000
Bonus 105000
Bonus 105000
Max Picks represents the total number of different "cards". Total Picks represents the max number of user choices. This is because after 8 choices, you are guaranteed to have 2 of each type so on the 9th pick, you are guaranteed to have 3 matching.
TotalPicks = 9
MaxPicks = 12
This should have been named PickedProbabilities.
Picks = {0:0,1:0,2:0,3:0}
This is my simple version of the timeit class because I don't like the timeit class
def Time_It(function):
start =datetime.now()
x = function()
finish = datetime.now()
TotalTime = finish - start
Minutes = int(TotalTime.seconds/60)
Seconds = TotalTime.seconds % 60
print('It took ' + str(Minutes) + ' minutes and ' + str(Seconds) + ' seconds')
return(x)
Given x(my picks in order) I find the probability. These picks are done without replacement
def Get_Prob(x,weight):
prob = 1
weights = weight.iloc[:,1]
for index in x:
num = weights[index]
denom = sum(weights)
prob *= num/denom
weights.drop(index, inplace = True)
# print(weights)
return(prob)
This is used to determine if there are duplicates in my loop because that is not allowed
def Is_Allowed(x):
return(len(x) == len(set(x)))
This determines if a win is present in all of the cards present thus far.
def Is_Win(x):
global Picks
WinTypes = [[0,1,2],[3,4,5],[6,7,8],[9,10,11]]
IsWin = False
for index,item in enumerate(WinTypes):
# print(index)
if set(item).issubset(set(x)):
IsWin = True
Picks[index] += Get_Prob(x,weight)
# print(Picks[index])
print(sum(Picks.values()))
break
return(IsWin)
This is my main function that cycles through all of the cards. I attempted to do this using recursion but I eventually gave up. I can't use itertools to create all of the permutations because for example [0,1,2,3,4] will be created by itertools but this is not possible because once you get 3 matching, the game ends.
def Cycle():
for a in range(MaxPicks):
x = [a]
for b in range(MaxPicks):
x = [a,b]
if Is_Allowed(x):
for c in range(MaxPicks):
x = [a,b,c]
if Is_Allowed(x):
if Is_Win(x):
# print(x)
continue
for d in range(MaxPicks):
x = [a,b,c,d]
if Is_Allowed(x):
if Is_Win(x):
# print(x)
continue
for e in range(MaxPicks):
x = [a,b,c,d,e]
if Is_Allowed(x):
if Is_Win(x):
continue
for f in range(MaxPicks):
x = [a,b,c,d,e,f]
if Is_Allowed(x):
if Is_Win(x):
continue
for g in range(MaxPicks):
x = [a,b,c,d,e,f,g]
if Is_Allowed(x):
if Is_Win(x):
continue
for h in range(MaxPicks):
x = [a,b,c,d,e,f,g,h]
if Is_Allowed(x):
if Is_Win(x):
continue
for i in range(MaxPicks):
if Is_Allowed(x):
if Is_Win(x):
continue
Calls the main function
x = Time_It(Cycle)
print(x)
writes the probabilities to a text file
with open('result.txt','w') as file:
# file.write(pickle.dumps(x))
for item in x:
file.write(str(item) + ',' + str(x[item]) + '\n')

My coworker created a similar script in R without the functions and his script takes 1/100th the time that mine takes.
Two easy optimizations:
1) In-line the function calls like Is_Allowed() because Python have a lot of function call overhead (such as creating a new stackframe and argument tuples).
2) Run the code in using pypy which is really good at optimizing functions like this one.

Ok, this time I hope I got your problem right:)
There are two insights (I guess you have them, just for the sake of the completeness) needed in order to speed up your program algorithmically:
The probabilities for the sequence (card_1, card_2) and (card_2, card_1) are not equal, so we cannot use the results from the urn problem, and it looks like we need to try out all permutations.
However, given a set of cards we picked so far, we don't really need the information in which sequence they where picked - it is all the same for the future course of the game. So it is enough to use dynamic programming and calculate the probabilities for every subset to be traversed during the game (thus we need to check 2^N instead of N! states).
For a set of picked cards set the probability to pick a card i in the next turn is:
norm:=sum Wi for i in set
P(i|set)=Wi/norm if i not in set else 0.0
The recursion for calculating P(set) - the probability that a set of picked card occured during the game is:
set_without_i:=set/{i}
P(set)=sum P(set_without_i)*P(i|set_without_i) for i in set
However this should be done only for set_without_i for which the game not ended yet, i.e. no group has 3 cards picked.
This can be done by means of recursion+memoization or, as my version does, by using bottom-up dynamic programming. It also uses binary representation of integers for representations of sets and (most important part!) returns the result almost instantly [('Grand', 0.0014104762718021384), ('Major', 0.0028878988709489244), ('Minor', 0.15321793072867956), ('Bonus', 0.84248369412856905)]:
#calculates probability to end the game with 3 cards of a type
N=12
#set representation int->list
def decode_set(encoded):
decoded=[False]*N
for i in xrange(N):
if encoded&(1<<i):
decoded[i]=True
return decoded
weights = [170000, 170000, 105, 170000, 170000, 215, 150000, 150000, 12000, 105000, 105000, 105000]
def get_probs(decoded_set):
denom=float(sum((w for w,is_taken in zip(weights, decoded_set) if not is_taken)))
return [w/denom if not is_taken else 0.0 for w,is_taken in zip(weights, decoded_set)]
def end_group(encoded_set):
for i in xrange(4):
whole_group = 7<<(3*i) #7=..000111, 56=00111000 and so on
if (encoded_set & whole_group)==whole_group:
return i
return None
#MAIN: dynamic program:
MAX=(1<<N)#max possible set is 1<<N-1
probs=[0.0]*MAX
#we always start with the empty set:
probs[0]=1.0
#building bottom-up
for current_set in xrange(MAX):
if end_group(current_set) is None: #game not ended yet!
decoded_set=decode_set(current_set)
trans_probs=get_probs(decoded_set)
for i, is_set in enumerate(decoded_set):
if not is_set:
new_set=current_set | (1<<i)
probs[new_set]+=probs[current_set]*trans_probs[i]
#filtering wins:
group_probs=[0.0]*4
for current_set in xrange(MAX):
group_won=end_group(current_set)
if group_won is not None:
group_probs[group_won]+=probs[current_set]
print zip(["Grand", "Major", "Minor", "Bonus"], group_probs)
Some explanation of the "tricks" used in code:
A pretty standard trick is to use integer's binary representation to encode a set. Let's say we have objects [a,b,c], so we could represent the set {b,c} as 110, which would mean a (first in the list corresponds to 0- the lowest digit) - not in the set, b(1) in the set, c(1) in the set. However, 110 read as integer it is 6.
The current_set - for loop simulates the game and best understood while playing. Let's play with two cards [a,b] with weights [2,1].
We start the game with an empty set, 0 as integer, so the probability vector (given set, its binary representation and as integer mapped onto probability):
probs=[{}=00=0->1.0, 01={a}=1->0.0, {b}=10=2->0.0, {a,b}=11=3->0.0]
We process the current_set=0, there are two possibilities 66% to take card a and 33% to take cardb, so the probabilities become after the processing:
probs=[{}=00=0->1.0, 01={a}=1->0.66, {b}=10=2->0.33, {a,b}=11=3->0.0]
Now we process the current_set=1={a} the only possibility is to take b so we will end with set {a,b}. So we need to update its (3={a,b}) probability via our formula and we get:
probs=[{}=00=0->1.0, 01={a}=1->0.66, {b}=10=2->0.33, {a,b}=11=3->0.66]
In the next step we process 2, and given set {b} the only possibility is to pick card a, so probability of set {a,b} needs to be updated again
probs=[{}=00=0->1.0, 01={a}=1->0.66, {b}=10=2->0.33, {a,b}=11=3->1.0]
We can get to {a,b} on two different paths - this could be seen in our algorithm. The probability to go through set {a,b} at some point in our game is obviously 1.0.
Another important thing: all paths that leads to {a,b} are taken care of before we process this set (it would be the next step).

Edit: I misunderstood the original problem, the here presented solution is for the following problem:
Given 4 groups with 3 different cards with a different score for every card, we pick up cards as long as we don't have picked 3 cards from the same group. What is the expected score(sum of scores of picked cards) in the end of the game.
I leave the solution as it is, because it was such a joy to work it out after so many probability-theory-less years and I just cannot delete it:)
See my other answer for handling of the original problem
There are two possibilities to improve the performance: making the code faster (and before starting this, one should profile in order to know which part of the program should be optimized, otherwise the time is spent optimizing things that don't count) or improving the algorithm. I propose to do the second.
Ok, this problem seems to be more complex as at the first site. Let's start with some observations.
All you need to know is the expected number of the picked cards at the end of the game:
If Pi is the probability that the card i is picked somewhere during the game, then we are looking for the expected value of the score E(Score)=P1*W1+P2*W2+...Pn*Wn. However, if we look at the cards of a group, we can state that because of the symmetry the probabilities for the cards of this group are the same, e.g. P1=P2=P3=:Pgrand in your case. Thus our expectation can be calculated:
E(Score)=3*Pgrand*(W1+W2+W3)/3+...+3*Pbonus*(W10+W11+W12)/3
We call averageWgrand:=(W1+W2+W3)/3 and note that E(#grand)=3*Pgrand - the expected number of picked grand card at the end of the game. With this our formula becomes:
E(Score)=E(#grand)*averageWgrand+...+E(#bonus)*averageWbonus
In your example we can go even further: The number of cards in every group is equal, so because of the symmetry we can claim: E(#grand)=E(#minor)=E(#major)=E(#grand)=:(E#group). For the sake of simplicity, in the following we consider only this special case (but the outlined solution could be extended also to the general case). This lead to the following simplification:
E(Score)=4*E(#group)(averageWgrand+...+averageWbonus)/4
We call averageW:=(averageWgrand+...+averageWbonus)/4 and note that E(#cards)=4*E(#grand) is the expected number of picked card at the end of the game.
Thus, E(Score)=E(#cards)*averageW, so our task is reduced to calculating the expected value of the number of cards at the end of the game:
E(#cards)=P(1)*1+P(2)*2+...P(n)*n
where P(i) denotes the probability, that the game ends with exact i cards. The probabilities P(1),P(2) and P(k), k>9 are easy to see - they are 0.
Calculation of the probability of ending the game with i picked cards -P(i):
Let's play a slightly different game: we pick exactly i cards and win if and only if:
There is exactly one group with 3 cards picked. We call this group full_group.
The last picked (i-th) card was from the full_group.
It is easy to see, that the probability to win this game P(win) is exactly the probability we are looking for - P(i). Once again we can use the symmetry, because all groups are equal (P(win, full=grand) means the probability that we what and that the full_group=grand):
P(win)=P(win, grand)+P(win, minor)+P(win, major)+P(win, bonus)
=4*P(win, grand)
P(win, grand) is the probability that:
after picking i-1 cards the number of picked grand cards is 2, i.e. `#grand=2' and
after picking i-1 cards, for every group the number of picked cards is less than 3 and
we pick a grand-card in the last round. Given the first two constraints hold, this (conditional) probability is 1/(n-i+1) (there are n-i+1 cards left and only one of them is "right").
From the urn problem we know the probability for
P(#grand=u, #minor=x, #major=y, #bonus=z) = binom(3,u)*binom(3,x)*binom(3,y)*binom(3,z)/binom(12, u+x+y+z)
with binom(n,k)=n!/k!/(n-k)!. Thus P(win, grand) can be calculated as:
P(win, grand) = 1/(n-i+1)*sum P(#grand=2, #minor=x, #major=y, #bonus=z)
where x<=2, y<=2, z<=2 and 2+x+y+z=i-1
And now the code:
import math
def binom(n,k):
return math.factorial(n)//math.factorial(k)//math.factorial(n-k)
#expected number of cards:
n=12 #there are 12 cards
probs=[0]*n
for minor in xrange(3):
for major in xrange(3):
for bonus in xrange(3):
i = 3 + minor +major +bonus
P_urn = binom(3,2)*binom(3,minor)*binom(3,major)*binom(3,bonus)/float(binom(n, n-i+1))
P_right_last_card = 1.0/(n-i+1)
probs[i]+=4*P_urn*P_right_last_card #factor 4 from symmetry
print "Expected number of cards:", sum((prob*card_cnt for card_cnt, prob in enumerate(probs)))
As result I get 6.94285714286 as the expected number of cards in the end of the game. And very fast - almost instantly. Not sure whether it is right though...
Conclusion:
Obviously, if you like to handle a more general case (more groups, number cards in a group different) you have to extend the code (recursion, memoization of binom) and the theory.
But the most crucial part: with this approach you (almost) don't care in which order the cards were picked - and thus the number of states you have to inspect is down by factor of (k-1)! where k is the maximal possible number of cards in the end of the game. In your example k=9 and thus the approach is faster by factor 40000 (I don't even consider the speed-up from the exploited symmetry, because it might not be possible in general case).

connect 4 minimax algorithm: one for loop

I'm trying to write the minimax algorithm in python with one for loop (yes I know wikipedia says the min and max players are often treated separately), and I'm using the variable turn to keep track of whether the min or max player is currently exploring options. I think, however, that at present the code wrongly evaluates for X when it is the O player's turn and O when it is the X player's turn.
Here's the source (p12) : http://web.cs.wpi.edu/~rich/courses/imgd4000-d10/lectures/E-MiniMax.pdf
Things you might be wondering about:
b is a list of lists; 0 denotes an available space
evaluate is used both for checking for a victory (by default) as well as for scoring the board for a particular player (we look for places where the value of a cell on the board ).
makeMove returns the row of the column the piece is placed in (used for subsequent removal)
Any help would be very much appreciated. Please let me know if anything is unclear.
def minMax(b, turn, depth=0):
player, piece = None, None
best, move = None, -1
if turn % 2 == 0 : # even player is max player
player, piece = 'max', 'X'
best, move = -1000, -1
else :
player, piece = 'min', 'O'
best, move = 1000, -1
if boardFull(b) or depth == MAX_DEPTH:
return evaluate(b, False, piece)
for col in range(N_COLS):
if possibleMove(b, col) :
row = makeMove(b, col, piece)
turn += 1 # now the other player's turn
score = minMax(b, turn, depth+1)
if player == 'max':
if score > best:
best, move = score, col
else:
if score < best:
best, move = score, col
reset(b, row, col)
return move
#seaotternerd. Yes I was wondering about that. But I'm not sure that is the problem. Here is one printout. As you can see, X has been dropped in the fourth column by AI but is evaluating from the min player's perspective (it counts 2 O units in the far right column).
Here's what the evaluate function determines, depending on piece:
if piece == 'O':
return best * -25
return best * 25

You are incrementing turn every time that you find a possible move and not undoing it. As a result, when control returns to a given minMax call, turn is 1 greater than it was before. Then, the next time your program finds a possible move, it increments turn again. This will cause the next call to minMax to select the wrong player as the current one. Overall, I believe this will result in the board getting evaluated for the wrong player approximately half the time. You can fix this by adding 1 to turn in the recursive call to minMax(), rather than by changing the value stored in the variables:
row = makeMove(b, col, piece)
score = minMax(b, turn+1, depth+1)
EDIT: Digging deeper into your code, I'm finding a number of additional problems:
MAX_DEPTH is set to 1. This will not allow the ai to see its own next move, instead forcing it to make decisions solely based on getting in the way of the other player.
minMax() returns the score if it has reached MAX_DEPTH or a win condition, but otherwise it returns a move. This breaks propagation of the score back up the recursion tree.
This is not critical, but it's something to keep in mind: your board evaluation function only takes into account how long a given player's longest string is, ignoring how the other player is doing and any other factors that may make one placement better than another. This mostly just means that your AI won't be very "smart."
EDIT 2: A big part of the problem with the way that you're keeping track of min and max is in your evaluation function. You check to see if each piece has won. You are then basing the score of that board off of who the current player is, but the point of having a min player and a max player is that you don't need to know who the current player is to evaluate the board. If max has won, the score is infinity. If min has won, the score is -infinity.
def evaluate(b, piece):
if evaluate_aux(b, True, 'X'):
return 100000
if evaluate_aux(b, True, 'O'):
return -100000
return evaluate_aux(b, False, piece)
In general, I think there is a lot that you could do to make your code cleaner and easier to read, which would make it a lot easier to detect errors. For instance, if you are saying that "X" is always max and "Y" is always min, then you don't need to bother keeping track of both player and piece. Additionally, having evaluate_aux sometimes return a boolean and sometimes return an int is confusing. You could just have it count the number of each piece in a row, for instance, with contiguous "X"s counting positive and contiguous "O"s counting negative and sum the scores; an evaluation function isn't supposed to be from one player's perspective or the other. Obviously you would still need to have a check for win conditions in there. This would also address point 3.
It's possible that there are more problems, but like I said, this code is not particularly easy to wade through. If you fix the things that I've already found and clean it up, I can take another look.

Minimax explanation "for dummies"

I'm quite new to algorithms and i was trying to understand the minimax, i read a lot of articles,but i still can't get how to implement it into a tic-tac-toe game in python.
Can you try to explain it to me as easy as possible maybe with some pseudo-code or some python code?.
I just need to understand how it works. i read a lot of stuff about that and i understood the basic, but i still can't get how it can return a move.
If you can please don't link me tutorials and samples like (http://en.literateprograms.org/Tic_Tac_Toe_(Python)) , i know that they are good, but i simply need a idiot explanation.
thank you for your time :)

the idea of "minimax" is that there in a two-player game, one player is trying to maximize some form of score and another player is trying to minimize it. For example, in Tic-Tac-Toe the win of X might be scored as +1 and the win of O as -1. X would be the max player, trying to maximize the final score and O would be the min player, trying to minimize the final score.
X is called the max player because when it is X's move, X needs to choose a move that maximizes the outcome after that move. When O players, O needs to choose a move that minimizes the outcome after that move. These rules are applied recursively, so that e.g. if there are only three board positions open to play, the best play of X is the one that forces O to choose a minimum-value move whose value is as high as possible.
In other words, the game-theoretic minimax value V for a board position B is defined as
V(B) = 1 if X has won in this position
V(B) = -1 if O has won in this position
V(B) = 0 if neither player has won and no more moves are possible (draw)
otherwise
V(B) = max(V(B1), ..., V(Bn)) where board positions B1..Bn are
the positions available for X, and it is X's move
V(B) = min(V(B1), ..., V(Bn)) where board positions B1..Bn are
the positions available for O, and it is O's move
The optimal strategy for X is always to move from B to Bi such that V(Bi) is maximum, i.e. corresponds to the gametheoretic value V(B), and for O, analogously, to choose a minimum successor position.
However, this is not usually possible to calculate in games like chess, because in order to calculate the gametheoretic value one needs to enumerate the whole game tree until final positions and that tree is usually extremely large. Therefore, a standard approach is to coin an "evaluation function" that maps board positions to scores that are hopefully correlated with the gametheoretic values. E.g. in chess programs evaluation functions tend to give positive score for material advantage, open columns etc. A minimax algorithm them minimaximizes the evaluation function score instead of the actual (uncomputable) gametheoretic value of a board position.
A significant, standard optimization to minimax is "alpha-beta pruning". It gives the same results as minimax search but faster. Minimax can be also casted in terms of "negamax" where the sign of the score is reversed at every search level. It is just an alternative way to implement minimax but handles players in a uniform fashion. Other game tree search methods include iterative deepening, proof-number search, and more.

Minimax is a way of exploring the space of potential moves in a two player game with alternating turns. You are trying to win, and your opponent is trying to prevent you from winning.
A key intuition is that if it's currently your turn, a two-move sequence that guarantees you a win isn't useful, because your opponent will not cooperate with you. You try to make moves that maximize your chances of winning and your opponent makes moves that minimize your chances of winning.
For that reason, it's not very useful to explore branches from moves that you make that are bad for you, or moves your opponent makes that are good for you.

Following a Dynamic Score

I have little to no formal discrete math training, and have run into a wee bit of an issue. I am trying to write an agent which reads in a human player's (arbitrary) score and scores a point every so often. The agent needs to "lag behind" and "catch up" every so often, so that the human player believes there is some competition going on. Then, the agent must either win or lose (depending on the condition) against the human.
I have tried a few different techniques, including a wonky probabilistic loop (which failed horribly). I was thinking that this problem calls for something like an emission Hidden Markov Model (HMM), but I'm not sure how to implement it (or even whether this is the best approach).
I have a gist up, but again, it sucks.
I hope the __main__ function provides some insight as to the goal of this agent. It is going to be called in pygame.

I think you may be over-thinking this. You can use simple probability to estimate how often and by how much the computer's score should "catch-up". Additionally, you can calculate the difference between the computer's score and human's score, and then feed this to a sigmoid-like function to give you the degree at which the computer's score increases.
Illustrative Python:
#!/usr/bin/python
import random, math
human_score = 0
computer_score = 0
trials = 100
computer_ahead_factor = 5 # maximum amount of points the computer can be ahead by
computer_catchup_prob = 0.33 # probability of computer catching up
computer_ahead_prob = 0.5 # probability of computer being ahead of human
computer_advantage_count = 0
for i in xrange(trials):
# Simulate player score increase.
human_score += random.randint(0,5) # add an arbitrary random amount
# Simulate computer lagging behind human, by calculating the probability of
# computer jumping ahead based on proximity to the human's score.
score_diff = human_score - computer_score
p = (math.atan(score_diff)/(math.pi/2.) + 1)/2.
if random.random() < computer_ahead_prob:
computer_score = human_score + random.randint(0,computer_ahead_factor)
elif random.random() < computer_catchup_prob:
computer_score += int(abs(score_diff)*p)
# Display scores.
print 'Human score:',human_score
print 'Computer score:',computer_score
computer_advantage_count += computer_score > human_score
print 'Effective computer advantage ratio: %.6f' % (computer_advantage_count/float(trials),)

I am making the assumption that the human cannot see the computer agent playing the game.  If this is the case, here is one idea you might try.
Create a list of all the possible point combinations that can be scored for any given move.  For each move, find a score range which you would like the agent to end up within after the current turn.  Reduce the set of possible move values to only the values which would end the agent in that particular range and randomly select one.  As conditions change for how far behind or ahead you would like the agent to get, simply slide your range appropriately.
If you are looking for something with some kind of built in and researched psychological effects for the human, I cant help you with that.  You will need to define more rules for us if you want something more specific to your situation than this.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.