i have this simple game where there is a ball bouncing on the screen and the player can move left and right of the screen and shoot an arrow up to pop the ball, every time the player hits a ball, the ball bursts and splits into two smaller balls until they reach a minimum size and disappear.
I am trying to solve this game with a genetic algorithm based on the python neat library and on this tutorial on flappy bird https://www.youtube.com/watch?v=MMxFDaIOHsE&list=PLzMcBGfZo4-lwGZWXz5Qgta_YNX3_vLS2, so I have a configuration file in which I have to specify how many input nodes must be in the network, I had thought to give as input the player's x coordinate, the distance between the player's x-coordinate and the ball's x-coordinate and the distance between the player's y-coordinate and the ball's y-coordinate.
My problem is that at the beginning of the game I have only one ball but after a few moves I could have more balls in the screen so I should have a greater number of input nodes,the more balls there are on the screen the more input coordinates I have to provide to the network.
So how to set the number of input nodes in a variable way?
config-feedforward.txt file
"""
# network parameters
num_hidden = 0
num_inputs = 3 #this needs to be variable
num_outputs = 3
"""
python file
for index,player in enumerate(game.players):
balls_array_x = []
balls_array_y = []
for ball in game.balls:
balls_array_x.append(ball.x)
balls_array_x.append(ball.y)
output = np.argmax(nets[index].activate(("there may be a number of variable arguments here")))
#other...
final code
for index,player in enumerate(game.players):
balls_array_x = []
balls_array_y = []
for ball in game.balls:
balls_array_x.append(ball.x)
balls_array_y.append(ball.y)
distance_list = []
player_x = player.x
player_y = player.y
i = 0
while i < len(balls_array_x):
dist = math.sqrt((balls_array_x[i] - player_x) ** 2 + (balls_array_y[i] - player_y) ** 2)
distance_list.append(dist)
i+=1
i = 0
if len(distance_list) > 0:
nearest_ball = min(distance_list)
output = np.argmax(nets[index].activate((player.x,player.y,nearest_ball)))
This is a good question and as far as I can tell from a quick Google search hasn't been addressed for simple ML algorithms like NEAT.
Traditionally resizing methods of Deep NN (padding, cropping, RNNs, middle-layers, etc) can obviously not be applied here since NEAT explicitly encodes each single neuron and connection.
I am also not aware of any general method/trick to make the input size mutable for the traditional NEAT algorithm and frankly don't think there is one. Though I can think of a couple of changes to the algorithm that would make this possible, but that's of no help to you I suppose.
In my opinion you therefore have 3 options:
You increase the input size to the maximum number of balls the algorithm should track and set the x-diff/y-diff value of non-existent balls to an otherwise impossible number (e.g. -1). If balls come into existence you actually set the values for those x-diff/y-diff input neurons and set them to -1 again when they are gone. Then you let NEAT figure it out. Also worth thinking about concatenating 2 separate NEAT NNs, with the first NN having 2 inputs, 1 output and the second NN having 1 (player pos) + x (max number of balls) inputs and 2 outputs (left, right). The first NN produces an output for each ball position (and is identical for each ball) and the second NN takes the first NNs output and turns it into an action. Also: The maximum number of balls doesn't have to be the maximum number of displayable balls, but can also be limited to 10 and only considering the 10 closest balls.
You only consider 1 ball for each action side (making your input 1 + 2*2). This could be the consideration of the lowest ball on each side or the closest ball on each side. Such preprocessing can make such simple NN tasks however quite easy to solve. Maybe you can add inertia into your test environment and thereby add a non-linearity that makes it not so straightforward to always teleport/hurry to the lowest ball.
You input the whole observation space into NEAT (or a uniformly downsampled fraction), e.g. the whole game at whatever resolution is lowest but still sensible. I know that this observation space is huge, but NEAT works quite well in handling such spaces.
I know that this is not the variable input size option of NEAT that you might have hoped for, but I don't know about any such general option/trick without changing the underlying NEAT algorithm significantly.
However, I am very happy to be corrected if someone knows a better option!
Related
I am making a python project with the 2D physics engine pymunk, but I am not familiar with pymunk or the base C library that it interatcs with, Chipmunk2D. I have quite a few different objects that I want to collide with others, but not collide with certain ones. There is a wall, an anchor point in the wall, a segment attached to the anchor point with a circle on the end, and a car. I want the car to ONLY collide with the wall and the segment, but the wall needs to also collide with the circle on the end of the segment. Other than that I want no collisions. I have tried using groups with the pymunk.ShapeFilter object, but the specific collisions are too complex for only using groups. I searched for a while and found out about categories and masks, but after looking at it I didn't understand. The explanation didn't make much sense to me and it was using bitwise operators which I don't really understand that well. I have been looking for a while but could not find any good tutorial or explanation so I want to know if someone could explain to me how it works or cite some useful resources.
It can look a bit tricky at first, but is actually quite straight forward, at least as long as you dont have too complicated needs.
With ShapeFilter you set the category that the shape belongs to, and what categories it can collide with (the mask property).
Both categories and the mask are stored as 32 bit integers (for performance), but instead just think of it as a list of 0s and 1s (maximum 32 digits long), where 1 means that the position is taken by that category. The list is written in Python in binary notation (0bxxxxx) where the x's is the list of 0s and 1s.
Lets say you have 3 categories of things. Cars, trees and clouds. Cars can collide with other cars and trees, trees can collide with cars, trees and clouds. And clouds can only collide with trees.
First I define the categories. In this example I only have three categories, so I only use 3 digits, but if I had more I could make it longer (up to 32 digits):
car = 0b100
tree = 0b010
cloud = 0b001
I want car to collide with itself. I also want it to collide with the tree. That means that the car mask should put 1s at the same positions as the of 1s of the car category and the tree category car_mask = 0b110. The tree can collide with car, itself and cloud, so all 3 positions should be set: tree_mask = 0b111. Finally, the Cloud can only collide with trees: cloud_mask = 0b010.
Then you need to assign these Shape Filters to the shapes:
car_shape.filter = pymunk.ShapeFilter(category = car, mask=car_mask)
tree_shape.filter = pymunk.ShapeFilter(category = tree, mask=tree_mask)
cloud_shape.filter = pymunk.ShapeFilter(category = cloud, mask=cloud_mask)
So I'm scratching my head at this, and I don't know what's wrong. The code is here. The idea is that the AI plays games against itself, and is rewarded for winning or drawing, and punished for losing. Given a board, the AI will choose a move with certain weights. If the game ends up being a win or draw, those moves that it selected will have their weights increased. If the game ends up being a loss, those moves that it selected will have their weights decreased.
Instead what I observe is that 1) The 'X' player (Player 1) will almost always go for either the top left or bottom right square, rather than in the middle as expected, and 2) The 'X' player will become more and more favoured to win as the number of games increases.
I have no idea what is causing this behaviour, and I would appreciate any help.
Apparently stackoverflow requires you to also put code in to use pastebin, so here is the reward bit, although it probably makes more sense in the full context linked above.
foo = ai_player
for i in range(0,len(moves_made)):
# Find the index of the move made
weight_index = foo.children.index(moves_made[i])
# If X won
if checkWin(current_player.placements) == 1:
if i % 2 == 0:
foo.weights[weight_index] += 10
else:
foo.weights[weight_index] -= 10
if foo.weights[weight_index] <= 0: foo.weights[weight_index] = 0
# If O won
if checkWin(current_player.placements) == -1:
if i % 2 == 0:
foo.weights[weight_index] -= 10
if foo.weights[weight_index] <= 0: foo.weights[weight_index] = 0
else:
foo.weights[weight_index] += 10
# If it was a draw
if checkWin(current_player.placements) == 0:
if i % 2 == 0:
foo.weights[weight_index] += 5
else:
foo.weights[weight_index] += 5
foo = foo.children[weight_index]
There's a number of issues with your original code for trying to get agents to learn.
You use absolute weights instead of some calculated heuristic for the weights. When adding weights, you want to use an update mechanism (e.g. Q-Learning).
Wikipedia Q-Learning
This bounds the board weights between the range of possible rewards and allows new, promising options to rise, given positive rewards (having a score +100 or +200 will be very hard to beat).
You assign weightings to the board indexes and not board positions. This will teach agents that certain squares of the board are 'the best' and other squares aren't regardless of the utility. (i.e. if there are X's on square 0 and square 1, and O thinks their 'absolute' best move is bottom right, however X can win next turn).
Moves seem to be random in this environment (as far as I can see). The agents need some way of 'exploring' to find which moves work and which moves don't, and then 'exploiting' the best moves found so far.
You only store the 'state' values and not the value of taking actions in a given state. With this, agents can calculate the value of the state they're in, but they need a way of calculating the value of actions they can take, to inform them of the best actions to take.
In the Reinforcement Learning framework, the agent takes in the state and needs to evaluate the best possible action to take, given the state.
I've re-written your code, using a Q-learning table to get the agents moving appropriately (moving the agents into a class). Also using NumPy for argmax etc functions.
Q-Learning function is:
self.Q_table[(state, action)] = q_val + 0.1 * \
(reward + self.return_value[(state, action)] / self.return_number[(state, action)])
Code is in a pastebin.
With Epsilon of 0.08 and 1,000,000 episodes, the win rate for X was ~16%, Y was ~7% and about ~77% draws.
With Epsilon of 0.04 and 1,000,000 episodes, the win rate for X was ~9%, Y was ~3% and about ~88% draws.
With Epsilon of 0.04 and 2,000,000 episodes - almost identical results to 1m episodes.
It's hard to tell just like this but possibly you reward them for playing draws and it keeps happening?
Also, you should evaluate checkWin() once at the start and you could make a simple function like update(weight) and pass 10, -10 or 5 depending on your case
Is your algorithm:
0. init all weights to 100 for every board state
reset board
get board state
make a random (weighted) move depending on weights/board state
check if the game ends, if so update weights and go back to 1, else go to 2
?
I made the original battleship and now I'm looking to upgrade my AI from random guessing to guessing statistically probably locations. I'm having trouble finding algorithms online, so my question is what kinds of algorithms already exist for this application? And how would I implement one?
Ships: 5, 4, 3, 3, 2
Field: 10X10
Board:
OCEAN = "O"
FIRE = "X"
HIT = "*"
SIZE = 10
SEA = [] # Blank Board
for x in range(SIZE):
SEA.append([OCEAN] * SIZE)
If you'd like to see the rest of the code, I posted it here: (https://github.com/Dbz/Battleship/blob/master/BattleShip.py); I didn't want to clutter the question with a lot of irrelevant code.
The ultimate naive solution wold be to go through every possible placement of ships (legal given what information is known) and counting the number of times each square is full.
obviously, in a relatively empty board this will not work as there are too many permutations, but a good start might be:
for each square on board: go through all ships and count in how many different ways it fits in that square, i.e. for each square of the ships length check if it fits horizontally and vertically.
an improvement might be to also check for each possible ship placement if the rest of the ships can be placed legally whilst covering all known 'hits' (places known to contain a ship).
to improve performance, if only one ship can be placed in a given spot, you no longer need to test it on other spots. also, when there are many 'hits', it might be quicker to first cover all known 'hits' and for each possible cover go through the rest.
edit: you might want to look into DFS.
Edit: Elaboration on OP's (#Dbz) suggestion in the comments:
hold a set of dismissed placements ('dissmissed') of ships (can be represented as string, say "4V5x3" for the placement of length 4 ship in 5x3, 5x4, 5x5, 5x6), after a guess you add all the placements the guess dismisses, then for each square hold a set of placements that intersect with it ('placements[x,y]') then the probability would be:
34-|intersection(placements[x,y], dissmissed)|/(3400-|dismissed|)
To add to the dismissed list:
if guess at (X,Y) is a miss add placements[x,y]
if guess at (X,Y) is a hit:
add neighboring placements (assuming that ships cannot be placed adjacently), i.e. add:
<(2,3a,3b,4,5)>H<X+1>x<Y>, <(2,3a,3b,4,5)>V<X>x<Y+1>
<(2,3a,3b,4,5)>H<X-(2,3,3,4,5)>x<Y>, <(2,3a,3b,4,5)>V<X>x<Y-(2,3,3,4,5)>
2H<X+-1>x<Y+(-2 to 1)>, 3aH<X+-1>x<Y+(-3 to 1)> ...
2V<X+(-2 to 1)>x<Y+-1>, 3aV<X+(-3 to 1)>x<Y+-1> ...
if |intersection(placements[x,y], dissmissed)|==33, i.e. only one placement possible add ship (see later)
check if any of the previews hits has only one possible placement left, if so, add the ship
check to see if any of the ships have only possible placement, if so, add the ship
adding a ship:
add all other placements of that ship to dismissed
for each (x,y) of the ships placement add placements[x,y] with out the actual placement
for each (x,y) of the ships placement mark as hit guess (if not already known) run stage 2
for each (x,y) neighboring the ships placement mark as miss guess (if not already known) run stage 1
run stage 3 and 4.
i might have over complicated this, there might be some redundant actions, but you get the point.
Nice question, and I like your idea for statistical approach.
I think I would have tried a machine learning approach for this problem as follows:
First model your problem as a classification problem.
The classification problem is: Given a square (x,y) - you want to tell the likelihood of having a ship in this square. Let this likelihood be p.
Next, you need to develop some 'features'. You can take the surrounding of (x,y) [as you might have partial knowledge on it] as your features.
For example, the features of the middle of the following mini-board (+ indicates the square you want to determine if there is a ship or not in):
OO*
O+*
?O?
can be something like:
f1 = (0,0) = false
f2 = (0,1) = false
f3 = (0,2) = true
f4 = (1,0) = false
**note skipping (1,1)
f5 = (1,2) = true
f6 = (2,0) = unknown
f7 = (2,1) = false
f8 = (2,2) = unknown
I'd implement features relative to the point of origin (in this case - (1,1)) and not as absolute location on board (so the square up to (3,3) will also be f2).
Now, create a training set. The training set is a 'labeled' set of features - based on some real boards. You can create it manually (create a lot of boards), automatically by a random generator of placements, or by some other data you can gather.
Feed the training set to a learning algorithm. The algorithm should be able to handle 'unknowns' and be able to give probability of "true" and not only a boolean answer. I think a variation of Naive Bayes can fit well here.
After you have got a classifier - exploit it with your AI.
When it's your turn, choose to fire upon a square which has the maximal value of p. At first, the shots will be kinda random - but with more shots you fire, you will have more information on the board, and the AI will exploit it for better predictions.
Note that I gave features based on a square of size 1. You can of course choose any k and find features on this bigger square - it will give you more features, but each might be less informative. There is no rule of thumb which will be better - and it should be tested.
Main question is, how are you going to find statistically probable locations. Are they already known or you want to figure them out?
Either case, I'd just make the grid weighed. In your case, the initial weight for each slot would be 1.0/(SIZE^2). The sum of weights must be equal to 1.
You can then adjust weights based on the statistics gathered from N last played games.
Now, when your AI makes a choice, it chooses a coordinate to hit based on weighed probabilities. The quick and simple way to do that would be:
Generate a random number R in range [0..1]
Start from slot (0, 0) adding the weights, i.e. S = W(0, 0) + W(0, 1) + .... where W(n, m) is the weight of the corresponding slot. Once S >= R, you've got the coordinate to hit.
This can be optimised by pre-calculating cumulative weights for each row, have fun :)
Find out which ships are still alive:
alive = [2,2,3,4] # length of alive ships
Find out spots where you have not shot, for example with a numpy.where()
Loop over spots where you can shoot.
Check the sides of the given position. Go left and right, how many spaces? Go up and down, how many spaces? If you can fit a boat in that many spaces, you can fit any smaller boat, so this loop I'd do it from the largest ship downwards, and I'd add to the counts in this position as many +1 as ships smaller than the one that fits.
Once you have done all of this, the position with more points should be the most probable to attack and hit something.
Of course, it can get as complicated as you want. You can also ask yourself, instead of which is my next hit, which combinations of hits will give me the victory in less number of hits or any other combination/parametrization of the problem. Good luck!
I'm making a PONG game for a school project using Kivy in Python. So far thanks to this forum I've made myself some AI for the NPC paddle.
This is the code:
if self.ball.y < self.player2.center_y:
self.player2.center_y = self.player2.center_y - 4
if self.ball.y > self.player2.center_y:
self.player2.center_y = self.player2.center_y + 4
This is in a method of PongGame() class called ArtificialIntelligence().
I use this to call it:
Clock.schedule_interval(game.ArtificialIntelligence, 1/300)
This allows me to call it once every 1/300th of a second. However, anything more than 1/300, I seem to have no difference. I.e. 1/9001 does not call it once every 1/9001th of a second.
The way it works is that it increases the y coordinate 4 pixels relative to the balls position, and it does this once every 1/300th of a second, hence why it doesn't "lag" at this rate. This is basically an "easy" mode for the player. If I want to do a "hard" mode, I need to make the NPC more accurate. I can do this by doing
self.player2.center_y = self.player2.center_y + 20
Something like this. This would be extremely accurate. HOWEVER, it does NOT look "fluid", it looks "laggy". I assume I could get the same amount of movement by calling the method more often instead of changing the amount it moves via altering the pixel movement. However, I don't know how to do this, because, as I said, changing it from anywhere above 1/300 seems to make no difference.
This is how I use my paddle:
if touch.x < self.width/3:
self.player1.center_y = touch.y
and I can move it as fast as I want because this updates as I move the mouse. And it looks fluid because it updates as often as it needs to update. I don't know how to do this with my AI.
Does anyone know how I could basically make the NPC paddle more accurate, allowing me to do Easy-Normal-Hard, etc, while retaining fluidity and no lag? I see only one way I could do it: Increase the amount the method is called.
However I bet there is a better way and I don't know how to do it. Does anyone have any Idea how I could do this? Thanks.
Edit: it looks like I can do it like this:
Clock.schedule_interval(game.ArtificialIntelligence, 1/300)
Clock.schedule_interval(game.ArtificialIntelligence, 1/300)
Clock.schedule_interval(game.ArtificialIntelligence, 1/300)
Clock.schedule_interval(game.ArtificialIntelligence, 1/300)
Clock.schedule_interval(game.ArtificialIntelligence, 1/300)
But that seems REALLY ugly and REALLY inefficient... I'd MUCH prefer a cleaner way.
At 300 frames per second, the problem is not in the rate of updates because you are exceeding the human eye's capacity to perceive movement by a factor of 50 or more.
The jerky movement comes because the ball is following a linear trajectory while your paddle is just hopping to where the ball is now. Ideally, your computer player could compute where the ball will be when it hits the plane of the computer paddle and then take a very smooth course to that location at 30 frames per second or less. Sure, the prediction math requires a tiny amount of trigonometry but it is the "right" way to do it in the sense of that is how a good player would play, by anticipating.
It would be far easier to just increase the size of the computer's paddle which would also give a visual indication to the human player of just how much harder the game is. When the computer's paddle has become a wall, the player would see that there is no winning to be done. The larger paddle would have the side effect of being less jerky, but whether this is a "good" way is your decision.
My advice is to use trig to work out where the ball will be, and have the paddle move there to intercept it.
Animation will be smooth at 30 frames per second.
When making a game AI is quite important these days that the player does not see it 'cheating' and giving it a larger paddle, or the ability to teleport, would be obvious signs of this. It should play in the same manner as a human, just better - not using some mechanism the player has no access to. "This game sucks because the CPU cheats" is a very common negative comment on videogame forums.
So if you need the computer to miss, make sure its trig calculations are off by a random factor, so the player cna't distinguish its play from a human's.
edit: For example: If random (X) <= speed of ball, then intercept correctly, otherwise miss by random (Y) units.
Thanks for your help guys, I was able to work it out with my teacher based on your help and his.
He developed this algorithm (tbh he did it so fast that I wasn't really able to comprehend it!) but this essentially uses trig to generate where the paddle will go (Note, I don't use the angle as I have other values that can be used)
def AIController(self, *args):
ballsFuturePos = self.ball.center_y + ((self.width - self.ball.center_x) / self.ball.velocity_x) * self.ball.velocity_y
numIterations = ((self.width - self.ball.center_x) / self.ball.velocity_x)
#print ballsFuturePos
#print numIterations
if numIterations > 0:
self.wantedPos = self.player2.center_y +(ballsFuturePos - self.player2.center_y) / numIterations
#print wantedPos
#self.player2.center_y = wantedPos + (error / wantedPos) * 100
if self.player2.center_y < self.wantedPos:
self.player2.center_y = self.player2.center_y + 9
if self.player2.center_y > self.wantedPos:
self.player2.center_y = self.player2.center_y - 9
So I generate where the ball is going to hit the rightmost part of the screen by getting the balls y position, adding the (width - the x position of the ball) (which gives me how far until the rightmost part of the screen is in x pixels), then I didivde that by the x velocity, and times the whole thing by the y velocity, which gives me the slope (I think), and as such now that the slope is calculated it also means It has the trajectory and as such can predict where the ball will hit the screen.
I calculate the number of iterations needed to get to the ball by taking the width of screen and minusing it by the balls x position, which calculates how far it is until the rightmost part of the screen, and divide that by the velocity. This gives me a number that I can use to iterate.
now I iterate with this, and create a variable called wantedPos which is where I want my paddle to go. This uses the paddles y position, adding (where the ball will be - where the paddle is), which gives me the distance between the ball and the paddle, divided by the number of iterations, which gives me the position the paddle will be to be at the same position at the ball. As numIterations decreases in each call, so does the wantedPos, meaning the gap gets smaller. I then iterate the paddle closer to the ball, using 9 as a speed, which allows me to increase or decrease difficulty.
Thanks. If I messed up any of my logic in trying to describe his actions, please tell! I think I understand it but a confirmation would be nice :)
I'm quite new to algorithms and i was trying to understand the minimax, i read a lot of articles,but i still can't get how to implement it into a tic-tac-toe game in python.
Can you try to explain it to me as easy as possible maybe with some pseudo-code or some python code?.
I just need to understand how it works. i read a lot of stuff about that and i understood the basic, but i still can't get how it can return a move.
If you can please don't link me tutorials and samples like (http://en.literateprograms.org/Tic_Tac_Toe_(Python)) , i know that they are good, but i simply need a idiot explanation.
thank you for your time :)
the idea of "minimax" is that there in a two-player game, one player is trying to maximize some form of score and another player is trying to minimize it. For example, in Tic-Tac-Toe the win of X might be scored as +1 and the win of O as -1. X would be the max player, trying to maximize the final score and O would be the min player, trying to minimize the final score.
X is called the max player because when it is X's move, X needs to choose a move that maximizes the outcome after that move. When O players, O needs to choose a move that minimizes the outcome after that move. These rules are applied recursively, so that e.g. if there are only three board positions open to play, the best play of X is the one that forces O to choose a minimum-value move whose value is as high as possible.
In other words, the game-theoretic minimax value V for a board position B is defined as
V(B) = 1 if X has won in this position
V(B) = -1 if O has won in this position
V(B) = 0 if neither player has won and no more moves are possible (draw)
otherwise
V(B) = max(V(B1), ..., V(Bn)) where board positions B1..Bn are
the positions available for X, and it is X's move
V(B) = min(V(B1), ..., V(Bn)) where board positions B1..Bn are
the positions available for O, and it is O's move
The optimal strategy for X is always to move from B to Bi such that V(Bi) is maximum, i.e. corresponds to the gametheoretic value V(B), and for O, analogously, to choose a minimum successor position.
However, this is not usually possible to calculate in games like chess, because in order to calculate the gametheoretic value one needs to enumerate the whole game tree until final positions and that tree is usually extremely large. Therefore, a standard approach is to coin an "evaluation function" that maps board positions to scores that are hopefully correlated with the gametheoretic values. E.g. in chess programs evaluation functions tend to give positive score for material advantage, open columns etc. A minimax algorithm them minimaximizes the evaluation function score instead of the actual (uncomputable) gametheoretic value of a board position.
A significant, standard optimization to minimax is "alpha-beta pruning". It gives the same results as minimax search but faster. Minimax can be also casted in terms of "negamax" where the sign of the score is reversed at every search level. It is just an alternative way to implement minimax but handles players in a uniform fashion. Other game tree search methods include iterative deepening, proof-number search, and more.
Minimax is a way of exploring the space of potential moves in a two player game with alternating turns. You are trying to win, and your opponent is trying to prevent you from winning.
A key intuition is that if it's currently your turn, a two-move sequence that guarantees you a win isn't useful, because your opponent will not cooperate with you. You try to make moves that maximize your chances of winning and your opponent makes moves that minimize your chances of winning.
For that reason, it's not very useful to explore branches from moves that you make that are bad for you, or moves your opponent makes that are good for you.