I would like to generate some data (position of the snake, available moves, distance from the food...) to create a neural network model so that it can be trained on the data to play the snake game. However, I don't know how to do that. My current ideas are:
Play manually (by myself) the game for many iterations and store the data (drawback: I should play the game a lot of times).
Make the snake do some random movements track and track their outcomes.
Play the snake with depth-fist search or similar algorithms many times and store the data.
Can you suggest to me some other method or should I choose from one of those? Which one in that case?
P.S. I don't know if it is the right place to ask such a question. However, I don't know whom/where to ask such a question hence, I am here.
If using a neural network, start simple. Think inputs and outputs and keep them simple too.
Inputs:
How many squares to the left of the head are free
How many squares to the right of the head are free
How many squares forward of the head are free
Relative position of next food left/right
Relative position of next food forward/back
Length of snake
keep inputs normalized to the minimum and maximum possible values to keep inputs in range -1.0 to 1.0
Outputs:
Turn Left
Turn Right
Straight Ahead
(choose the output with highest activation)
Next problem is training. Typical application might be use a genetic algorithm for all the weights of the above neural network. This randomizes and tests new versions of the neural network every life. After X attempts, create a new evolution and repeat to improve. This is pretty much doing random moves automatically (your second choice)
Next problem is fitness for training. How do you know which neural network is better? Well you could simply use the length of the snake as a fitness factor - the longer the snake, the better and more 'fit'
Related
So I have a DQN Agent that plays the card game Schnapsen. I wont bore you with the details of the game as they are not so related to the question I am about to ask. The only important point is that for every round of the game, there are specific valid moves a player can take. The DQN Agent I have created sometime outputs non-valid moves, in the form of an integer. There are 28 possible moves in the entire game, so sometimes it will output a move that cannot be played based on the current state of the game, for example playing the Jack of Diamonds when it is not in its hand. I was wondering if there was any way for me to "map" the outputs of the neural network into the most similar move in the case that it does not converge? Would that be the best approach to this problem or do I have to tune the neural network better?
As of right now, whenever the DQN Agent does not output a valid move, it falls on to another algorithm, a Bully Bot implementation that plays one of the possible valid moves. Here is the link to my github repo with the code. To run the code where the DQN Agent plays against a bully bot, just navigate into the executables file and run : python cli.py bully-bot
One approach to mapping the outputs of your neural network to the most similar valid move would be to use "softmax" to convert the raw outputs of the network into a probability distribution over the possible moves. Then, you could select the move with the highest probability that is also a valid move. Another approach could be to use "argmax" which returns the index of the maximum value in the output. Then you will have to check whether the returned index corresponds to a valid move or not. If not, you can select the next possible index which corresponds to a valid move.
I'm currently trying to implement a neural network in Python to play the game of Snake that is trained using a genetic algorithm (although that's a separate matter right now).
Every network that plays the game does the same movement over and over (e.g. continues in a straight line, keeps turning left). There are 5 inputs to the network: the distance to an object (food, a boundary, its own tail) in all four directions, and the angle between the food and the direction the snake is facing. The three outputs represent turning left, continuing straight, and turning right.
I've never worked with anything like this before so I have a fairly basic understanding at this point. The number of hidden layers and the number of nodes per layer is variable, and is something I have been altering a lot to test, but the snakes continue to each repeat the exact same motion.
Any advice on why this is happening would be greatly appreciated, and how to fix it. I can show my code if it's useful.
You may have weights initialized to zero and if they aren't trained properly and stay zero for some reason, neural network will be producing bias as an output always.
I'm interested in modelling a system that can use openai gym to make a model that not only performs well but hopefully even better yet continuously improves to converge on the best moves.
This is how I initialize the env
import gym
env = gym.make("CartPole-v0")
env.reset()
it returns a set of info; observation, reward, done and info, info always nothing so ignore that.
reward I'd hope would signify whether the action taken is good or bad but it always returns a reward of 1 until the game ends, it's more of a counter of how long you've been playing.
The action can be sampled by
action = env.action_space.sample()
which in this case is either 1 or 0.
To put into perspective for anyone who doesn't know what this game is, here's the link and it's objective is to balance a pole by moving left or right i.e. provide an input of 0 or 1.
The observation is the only key way to tell whether you're making a good or bad move.
obs, reward, done, info = env.step(action)
and the observation looks something like this
array([-0.02861881, 0.02662095, -0.01234258, 0.03900408])
as I said before reward is always 1 so not a good pointer of good or bad move based on the observation and done means the game has come to an end though I also can't tell if it means you lost or won also.
Since the objective as you'll see from the link to the page is to balance the pole for a total reward of +195 averaged over 100 games that's the determining guide of a successful game, not sure then if you've successfully then balanced it completely or just lasted long but still, I've followed a few examples and suggestion to generate a lot of random games and those that do rank well use them to train a model.
But this way feels sketchy and not inherently aware of what a failing move is i.e. when you're about to tip the pole more than 15 degrees or the cart moves 2.4 units from the center.
I've been able to gather data from running the simulation for over 200000 times and using this also found I've got a good number of games that lasted for more than 80 steps. (the goal is 195) so using this I graphed these games (< ipython notebook) there's a number of graphs and since I'm graphing each observation individually per game it's too many graphs to put here just to hopefully then maybe see a link between a final observation and the game ending since these are randomly sampled actions so it's random moves.
What I thought I saw was maybe for the first observation that if it gets to 0 the game ends but I've also seen some others where the game runs with negative values. I can't make sense of the data even with graphing basically.
What I really would like to know is if possible what each value in the observation means and also if 0 means left or right but the later would be easier to deduce when I can understand the first.
It seems you asked this question quite some time ago. However, the answer is that the observation is given by the position of the cart, the angle of the pole and their derivatives. The position in the middle is 0. So the negative is left and positive is right.
I just started working on an artificial life simulation (again... I lost the other one) in Python and Pygame using Pybrain, and I'm planning how this is going to work. So far I have an environment with some "food pellets". A food pellet is added every minutes. I haven't made my agents (aka "Creatures") yet, but I know I want them to have simple feed forward neural networks with some inputs and the outputs will be its' movement. I want the inputs to show what's in front of them, sort of like they are seeing the simulated world in front of them. How should I go about this? I either want them to actually "see" the colors in their line of vision, or just input the nearest object into their NN. Which one would be best, and how will I implement them?
Having a full field of vision is technically possible in a neural network, but requires a LOT of inputs and massive processing; not a direction you should expect to be able to evolve in any kind of meaningful way.
A neural network deals with values and thresholds. I'd recommend using two inputs associated with the nearest individual - one of them has a value for distance (of the nearest) and the other its angle (with zero being directly ahead, less than zero being on the left and greater than zero bring on the right).
Make sure that these values are easy to process into outputs. For example, if one output goes to a rotation actuator, make sure that the input values and output values are on the same scale. Then it will be easy to both turn toward or away from a particular individual.
If you want them to be able to see multiple individuals, simple include multiple pairs of inputs. I was going to suggest putting them in distance order, but it might be easier for them if as soon as an organism sees something it always comes in to the same inputs until it's no longer tracked.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I am trying to build a simple evolution simulation of agents controlled by neural network. In the current version each agent has feed-forward neural net with one hidden layer. The environment contains fixed amount of food represented as a red dot. When an agent moves, he loses energy, and when he is near the food, he gains energy. Agent with 0 energy dies. the input of the neural net is the current angle of the agent and a vector to the closest food. Every time step, the angle of movement of each agent is changed by the output of its neural net. The aim of course is to see food-seeking behavior evolves after some time. However, nothing happens.
I don't know if the problem is the structure the neural net (too simple?) or the reproduction mechanism: to prevent population explosion, the initial population is about 20 agents, and as the population becomes close to 50, the reproduction chance approaches zero. When reproduction does occur, the parent is chosen by going over the list of agents from beginning to end, and checking for each agent whether or not a random number between 0 to 1 is less than the ratio between this agent's energy and the sum of the energy of all agents. If so, the searching is over and this agent becomes a parent, as we add to the environment a copy of this agent with some probability of mutations in one or more of the weights in his neural network.
Thanks in advance!
If the environment is benign enough (e.g it's easy enough to find food) then just moving randomly may be a perfectly viable strategy and reproductive success may be far more influenced by luck than anything else. Also consider unintended consequences: e.g if offspring is co-sited with its parent then both are immediately in competition with each other in the local area and this might be sufficiently disadvantageous to lead to the death of both in the longer term.
To test your system, introduce an individual with a "premade" neural network set up to steer the individual directly towards the nearest food (your model is such that such a thing exists and is reasobably easy to write down, right? If not, it's unreasonable to expect it to evolve!). Introduce that individual into your simulation amongst the dumb masses. If the individual doesn't quickly dominate, it suggests your simulation isn't set up to reinforce such behaviour. But if the individual enjoys reproductive success and it and its descendants take over, then your simulation is doing something right and you need to look elsewhere for the reason such behaviour isn't evolving.
Update in response to comment:
Seems to me this mixing of angles and vectors is dubious. Whether individuals can evolve towards the "move straight towards nearest food" behaviour must rather depend on how well an atan function can be approximated by your network (I'm sceptical). Again, this suggests more testing:
set aside all the ecological simulation and just test perturbing a population
of your style of random networks to see if they can evolve towards the expected function.
(simpler, better) Have the network output a vector (instead of an angle): the direction the individual should move in (of course this means having 2 output nodes instead of one). Obviously the "move straight towards food" strategy is then just a straight pass-through of the "direction towards food" vector components, and the interesting thing is then to see whether your random networks evolve towards this simple "identity function" (also should allow introduction of a readymade optimised individual as described above).
I'm dubious about the "fixed amount of food" too. (I assume you mean as soon as a red dot is consumed, another one is introduced). A more "realistic" model might be to introduce food at a constant rate, and not impose any artificial population limits: population limits are determined by the limitations of food supply. e.g If you introduce 100 units of food a minute and individuals need 1 unit of food per minute to survive, then your simulation should find it tends towards a long term average population of 100 individuals without any need for a clamp to avoid a "population explosion" (although boom-and-bust, feast-or-famine dynamics may actually emerge depending on the details).
This sounds like a problem for Reinforcement Learning, there is a good online textbook too.