Dungeon Game Solution explanation - python

The dungeon game is described as:
The demons had captured the princess (P) and imprisoned her
in the bottom-right corner of a dungeon. T
he dungeon consists of M x N rooms laid out in a 2D grid.
Our valiant knight (K) was initially positioned in the top-left room
and must fight his way through the dungeon to rescue the princess.
The knight has an initial health point represented by a positive integer.
If at any point his health point drops to 0 or below, he dies immediately.
Some of the rooms are guarded by demons,
so the knight loses health (negative integers) upon entering these rooms;
other rooms are either empty (0's) or contain magic orbs that increase the knight's health (positive integers).
In order to reach the princess as quickly as possible,
the knight decides to move only rightward or downward in each step.
Write a function to determine the knight's minimum initial health
so that he is able to rescue the princess.
For example, given the dungeon below, the initial health of
the knight must be at least 7 if he follows the optimal path RIGHT-> RIGHT -> DOWN -> DOWN.
Notes:
The knight's health has no upper bound.
Any room can contain threats or power-ups, even the first room the knight enters
and the bottom-right room where the princess is imprisoned.
Example:
dungeon = [[-2, -3, 4],
[-6, -15, 0],
[10, 25, -6]]
Answer: 8
The code solution is:
def dungeonGame(dungeon):
dp = [float("inf") for _ in dungeon[0]]
dp[-1] = 1
for i in reversed(range(len(dungeon))):
dp[-1] = max(dp[-1] - dungeon[i][-1], 1)
for j in reversed(range(len(dungeon[i]) - 1)):
min_HP_on_exit = min(dp[j], dp[j + 1])
dp[j] = max(min_HP_on_exit - dungeon[i][j], 1)
return dp[0]
Can somebody explain how the solution above is working? Why is the dp only len 3 with the provided example? Is it because there are only 3 steps required, excluding start and finish rooms? Why is it getting the minimum on the adjacent dp's and then the maximum? Also how come it seems that the last column is not being taken into consideration since dungeon[i][j], where j only goes up to 1 (taking the given example matrix). I know the solution is written well, just trying to understand how its taking all the path into consideration.

This algorithm works its way back from the bottom right, going left and then up, finding the optimal score for each step along the way. I recommend you execute the algorithm with pen and paper, writing down the current values of i, j and dp along the way. That should really clear things up.
(Start): No i and no j yet, dp = [inf inf 1]
You'll need at least 1 HP after reaching the bottom right in order to win.
(After entering the first loop): i=2, dp = [inf inf 7].
You need 7 health to survive the -6 of the bottom right square itself.
(After entering the inner loop): i=2, j=1, dp = [inf 1 7]
If you're in the bottom center square, the bare minimum 1 health is enough to survive that square's +25, and reach the adjacent square that requires at least 7. And so on.
This is the crucial line that chooses between going right (stored in the next element of the intermediate results, dp[j + 1]) or down, dp[j].
min_HP_on_exit = min(dp[j], dp[j + 1])
There are only three elements to the intermediate results because with the movement rules (only move right and down) and a dungeon with a diagonal of 3, there are only at most 3 places where you could be after any number of moves.
Every time the solver moves up a line, the last column is taken care of as a special case here:
dp[-1] = max(dp[-1] - dungeon[i][-1], 1)
Why? Well, it's different from the other columns in that you can't move right, only down.

Related

Python: Reinforcement learning Tic-Tac-Toe AI working?

So I'm scratching my head at this, and I don't know what's wrong. The code is here. The idea is that the AI plays games against itself, and is rewarded for winning or drawing, and punished for losing. Given a board, the AI will choose a move with certain weights. If the game ends up being a win or draw, those moves that it selected will have their weights increased. If the game ends up being a loss, those moves that it selected will have their weights decreased.
Instead what I observe is that 1) The 'X' player (Player 1) will almost always go for either the top left or bottom right square, rather than in the middle as expected, and 2) The 'X' player will become more and more favoured to win as the number of games increases.
I have no idea what is causing this behaviour, and I would appreciate any help.
Apparently stackoverflow requires you to also put code in to use pastebin, so here is the reward bit, although it probably makes more sense in the full context linked above.
foo = ai_player
for i in range(0,len(moves_made)):
# Find the index of the move made
weight_index = foo.children.index(moves_made[i])
# If X won
if checkWin(current_player.placements) == 1:
if i % 2 == 0:
foo.weights[weight_index] += 10
else:
foo.weights[weight_index] -= 10
if foo.weights[weight_index] <= 0: foo.weights[weight_index] = 0
# If O won
if checkWin(current_player.placements) == -1:
if i % 2 == 0:
foo.weights[weight_index] -= 10
if foo.weights[weight_index] <= 0: foo.weights[weight_index] = 0
else:
foo.weights[weight_index] += 10
# If it was a draw
if checkWin(current_player.placements) == 0:
if i % 2 == 0:
foo.weights[weight_index] += 5
else:
foo.weights[weight_index] += 5
foo = foo.children[weight_index]
There's a number of issues with your original code for trying to get agents to learn.
You use absolute weights instead of some calculated heuristic for the weights. When adding weights, you want to use an update mechanism (e.g. Q-Learning).
Wikipedia Q-Learning
This bounds the board weights between the range of possible rewards and allows new, promising options to rise, given positive rewards (having a score +100 or +200 will be very hard to beat).
You assign weightings to the board indexes and not board positions. This will teach agents that certain squares of the board are 'the best' and other squares aren't regardless of the utility. (i.e. if there are X's on square 0 and square 1, and O thinks their 'absolute' best move is bottom right, however X can win next turn).
Moves seem to be random in this environment (as far as I can see). The agents need some way of 'exploring' to find which moves work and which moves don't, and then 'exploiting' the best moves found so far.
You only store the 'state' values and not the value of taking actions in a given state. With this, agents can calculate the value of the state they're in, but they need a way of calculating the value of actions they can take, to inform them of the best actions to take.
In the Reinforcement Learning framework, the agent takes in the state and needs to evaluate the best possible action to take, given the state.
I've re-written your code, using a Q-learning table to get the agents moving appropriately (moving the agents into a class). Also using NumPy for argmax etc functions.
Q-Learning function is:
self.Q_table[(state, action)] = q_val + 0.1 * \
(reward + self.return_value[(state, action)] / self.return_number[(state, action)])
Code is in a pastebin.
With Epsilon of 0.08 and 1,000,000 episodes, the win rate for X was ~16%, Y was ~7% and about ~77% draws.
With Epsilon of 0.04 and 1,000,000 episodes, the win rate for X was ~9%, Y was ~3% and about ~88% draws.
With Epsilon of 0.04 and 2,000,000 episodes - almost identical results to 1m episodes.
It's hard to tell just like this but possibly you reward them for playing draws and it keeps happening?
Also, you should evaluate checkWin() once at the start and you could make a simple function like update(weight) and pass 10, -10 or 5 depending on your case
Is your algorithm:
0. init all weights to 100 for every board state
reset board
get board state
make a random (weighted) move depending on weights/board state
check if the game ends, if so update weights and go back to 1, else go to 2
?

set variable number of input nodes in python neat

i have this simple game where there is a ball bouncing on the screen and the player can move left and right of the screen and shoot an arrow up to pop the ball, every time the player hits a ball, the ball bursts and splits into two smaller balls until they reach a minimum size and disappear.
I am trying to solve this game with a genetic algorithm based on the python neat library and on this tutorial on flappy bird https://www.youtube.com/watch?v=MMxFDaIOHsE&list=PLzMcBGfZo4-lwGZWXz5Qgta_YNX3_vLS2, so I have a configuration file in which I have to specify how many input nodes must be in the network, I had thought to give as input the player's x coordinate, the distance between the player's x-coordinate and the ball's x-coordinate and the distance between the player's y-coordinate and the ball's y-coordinate.
My problem is that at the beginning of the game I have only one ball but after a few moves I could have more balls in the screen so I should have a greater number of input nodes,the more balls there are on the screen the more input coordinates I have to provide to the network.
So how to set the number of input nodes in a variable way?
config-feedforward.txt file
"""
# network parameters
num_hidden = 0
num_inputs = 3 #this needs to be variable
num_outputs = 3
"""
python file
for index,player in enumerate(game.players):
balls_array_x = []
balls_array_y = []
for ball in game.balls:
balls_array_x.append(ball.x)
balls_array_x.append(ball.y)
output = np.argmax(nets[index].activate(("there may be a number of variable arguments here")))
#other...
final code
for index,player in enumerate(game.players):
balls_array_x = []
balls_array_y = []
for ball in game.balls:
balls_array_x.append(ball.x)
balls_array_y.append(ball.y)
distance_list = []
player_x = player.x
player_y = player.y
i = 0
while i < len(balls_array_x):
dist = math.sqrt((balls_array_x[i] - player_x) ** 2 + (balls_array_y[i] - player_y) ** 2)
distance_list.append(dist)
i+=1
i = 0
if len(distance_list) > 0:
nearest_ball = min(distance_list)
output = np.argmax(nets[index].activate((player.x,player.y,nearest_ball)))
This is a good question and as far as I can tell from a quick Google search hasn't been addressed for simple ML algorithms like NEAT.
Traditionally resizing methods of Deep NN (padding, cropping, RNNs, middle-layers, etc) can obviously not be applied here since NEAT explicitly encodes each single neuron and connection.
I am also not aware of any general method/trick to make the input size mutable for the traditional NEAT algorithm and frankly don't think there is one. Though I can think of a couple of changes to the algorithm that would make this possible, but that's of no help to you I suppose.
In my opinion you therefore have 3 options:
You increase the input size to the maximum number of balls the algorithm should track and set the x-diff/y-diff value of non-existent balls to an otherwise impossible number (e.g. -1). If balls come into existence you actually set the values for those x-diff/y-diff input neurons and set them to -1 again when they are gone. Then you let NEAT figure it out. Also worth thinking about concatenating 2 separate NEAT NNs, with the first NN having 2 inputs, 1 output and the second NN having 1 (player pos) + x (max number of balls) inputs and 2 outputs (left, right). The first NN produces an output for each ball position (and is identical for each ball) and the second NN takes the first NNs output and turns it into an action. Also: The maximum number of balls doesn't have to be the maximum number of displayable balls, but can also be limited to 10 and only considering the 10 closest balls.
You only consider 1 ball for each action side (making your input 1 + 2*2). This could be the consideration of the lowest ball on each side or the closest ball on each side. Such preprocessing can make such simple NN tasks however quite easy to solve. Maybe you can add inertia into your test environment and thereby add a non-linearity that makes it not so straightforward to always teleport/hurry to the lowest ball.
You input the whole observation space into NEAT (or a uniformly downsampled fraction), e.g. the whole game at whatever resolution is lowest but still sensible. I know that this observation space is huge, but NEAT works quite well in handling such spaces.
I know that this is not the variable input size option of NEAT that you might have hoped for, but I don't know about any such general option/trick without changing the underlying NEAT algorithm significantly.
However, I am very happy to be corrected if someone knows a better option!

Struggling with "simple" chess related randomized code

I have as task to create a code that should randomly place two chess pieces on the board
The board looks like this: Chess board
What I should do is "place" two randomly selected chess pieces (one black, one white) on this board above and find out if these pieces fall in places where they would threat each other based on chess rules. I should repeat this 10 000 times and find out how often a threatening situation is met. Pieces should not fall in the same square.
There are two different scenarios:
3a. The chess pieces are rooks(towers) - move horisontally or vertically OR
3b. The chess pieces are queens - move horisontally, vertically and diagonally OR
My assumption is that the approach in 3a should be that if either the first digit or last digit are same for both chess pieces the situation causes a threat. I would use the random-function to assign random numbers for both pieces and run this 10K times. I'm not sure though how to build this code to find out if the chess pieces cause a threat to each other. How to use the chess board (11-88) to define the random numbers?
the approach for 3b is, on the other hand, a big question mark due to the fact that queens can also move diagonally.
I would really appreciate some code example to understand how this is tackled.
Thanks!
EDIT: Below is the code I generated myself using the skills so far acquired. I believe this gives me the answer of how often a queen threatens another queen BUT it does not take into account that the chess pieces can't be placed on the same square on the board. How could I implement it in my code, taken that the code is otherwise correct? I'd like to modify it as little as possible.
import random
rows=range(1,9) #generates the row number
columns=range(97,97+8) #generates the column number
hot=0 #times threat took place
icke_hot=0 #times threat did not take place
for spel in range(10000):
white_row=random.choice(rows) #Gets position in row
black_row=random.choice(rows)
white_column=random.choice(columns) #Gets position in column
black_column=random.choice(columns)
if white_column==black_column or white_row==black_row: #checks if rook can attack
hot=hot+1
elif white_column==black_column or white_column==black_column+1 or white_column==black_column-1:
hot=hot+1
#checks if queen can attack
elif white_row==black_row or white_row==black_row+1 or white_row==black_row-1:
hot=hot+1
#checks if queen can attack
else:
icke_hot=icke_hot+1
print "Threats took place in", hot/100.0,"% of cases"
print hot
print icke_hot
Comments on OP Code
Column numbering is problematic
columns=range(97,97+8) #generates the column number 97, 98, ...
You could use
columns = [chr(i) for i in range(97,97+8)
The following generation of row/columns does not insure that white & black are not assigned the same square.
white_row=random.choice(rows) #Gets position in row
black_row=random.choice(rows)
white_column=random.choice(columns) #Gets position in column
black_column=random.choice(columns)
It's better to define a position as a row & column pair then
choose randomly two of the 64 positions using either random.choices or random.sample
This code doesn't check for all positions along diagonal for queen attack.
elif white_column==black_column or white_column==black_column+1 or white_column==black_column-1:
hot=hot+1
#checks if queen can attack
elif white_row==black_row or white_row==black_row+1 or white_row==black_row-1:
hot=hot+1
#checks if queen can attack
Revised Code
import random as random
def all_positions():
" Generates list of all piece positions (i.e. 64 positions) "
rows = '12345678'
columns = 'abcdefgh'
return [r+c for r in rows for c in columns] # i.e. ['1a', '1b', ...]
def random_positions():
" Two random positions (not equal) "
# Use random sample without replacement
return random.sample(all_positions(), 2)
def position_to_numeric(position):
" Convert row, column string to numeric tuple i.e. '3a -> 3, 1 "
return int(position[0]), 'abcdefgh'.index(position[1]) + 1
def can_rook_attack(position1, position2):
" Do rooks at position1 and position2 attack each other "
return position1[0] == position2[0] or position1[1] == position2[1]
def can_queen_attack(position1, position2):
" if queens at position 1 & 2 can attack each other "
# Check as rooks they would attack
if can_rook_attack(position1, position2):
return True
# Get positions as numeric
r1, c1 = position_to_numeric(position1)
r2, c2 = position_to_numeric(position2)
# If queen and the opponent are
# If queen can attack diagonally
return abs(r1 - r2) == abs(c1 - c2)
def simulate_rooks(number_trials):
" Simulate of how often rook can attack "
cnt = 0
for _ in range(number_trials):
position1, position2 = random_positions()
if can_rook_attack(position1, position2):
cnt += 1
return f'Rook attacked {cnt} out of {number_trials} random placements'
def simulate_queens(number_trials):
" Simulate of how often queens can attack "
cnt = 0
for _ in range(number_trials):
position1, position2 = random_positions()
if can_queen_attack(position1, position2):
cnt += 1
return f'Queen attacked {cnt} out of {number_trials} random placements'
Test
simulations = 10000
print(simulate_rooks(simulations))
print(simulate_queens(simulations))
Output
Rook attacked 2257 out of 10000 random placements
Queen attacked 3634 out of 10000 random placements
Explanation
Key Issues
What I should do is "place" two randomly selected chess pieces (one
black, one white) on this board above and find out if these pieces
fall in places where they would threat each other based on chess
rules. I should repeat this 10 000 times and find out how often a
threatening situation is met. Pieces should not fall in the same
square.
This is done in simulate_rooks and simulate_queens with the following code
for _ in range(number_trials):
position1, position2 = random_positions()
Where random_positions uses random.sample (without replacement) to select two positions out of all possible positions
My assumption is that the approach in 3a should be that if either the first digit or last digit are same for both chess pieces the situation
This is done with the code in can_rook_attack
return position1[0] == position2[0] or position1[1] == position2[1]
the approach for 3b is, on the other hand, a big question mark due to the fact that queens can also move diagonally.
This is solved using:
return abs(r1 - r2) == abs(c1 - c2)
This uses the constraint that to be along the diagonal the slope has to be +/-1 which leads to the above expression.

Snakes and ladders, check if ever will land on last square

I am implementing a snakes and ladders game in python using linked lists. The node links to the next square, and the last square linked to the first square. (circular). I also have snakes and ladders, so that each node also has a parameter called destination, which is None if it does not link anywhere, but if it does, then it contains the address of the other node.
Something special about my game is that I have a fixed roll. If my fixed roll is a 4, I will always move 4 nodes. If the node I land on is connected to snake or ladder, then I will go there.
I start of at the 4th square, or the square at which my roll is at.
I need a way to check if I will ever land on the last square.
Consider 16 squares, and a roll of 2. I start at 2nd square. But there is a ladder, so I move to 11th square. Now every time I move 2 nodes. After two turns, I will move to the yellow square. Then when I move again, I will move to final square, and back to square 1 (you have to land on final square to win). But then I notice that if I keep rolling 2's, I will never land on the final square, and I need a way of detecting this.
I don't need any code, but just some suggestions of how I can detect if I will never land on the final square. Thank you
Your problem translates to the problem of finding a cycle in your square traversal.
The overall idea goes as follows: "If I have visited the same node more than once without reaching the final square, then I will never reach it."
You can implement this, for example, by including a visited member to the square class and checking if you arrive to a square that was visited before. In that case you can stop the traversal.
You can do reachability analysis on the graph representing moves. Here is the code
to do that
nodes = list(range(16))
roll = list(range(1,4))
A = list(range(16))
A[1] = 10
A[9] = 6
A[5] = 13
edges = {(i,j): A[(i+j)%16] for i in nodes for j in roll}
change = True
start = current = 0
states = set([start])
oldlen = len(states)
while change:
current = edges[(current, 2)]
states.add(current)
change = (oldlen != len(states))
oldlen = len(states)
print(states)
If you have multiple possible moves from a position, change detection would be bit more complicated.

How to generate statistically probably locations for ships in battleship

I made the original battleship and now I'm looking to upgrade my AI from random guessing to guessing statistically probably locations. I'm having trouble finding algorithms online, so my question is what kinds of algorithms already exist for this application? And how would I implement one?
Ships: 5, 4, 3, 3, 2
Field: 10X10
Board:
OCEAN = "O"
FIRE = "X"
HIT = "*"
SIZE = 10
SEA = [] # Blank Board
for x in range(SIZE):
SEA.append([OCEAN] * SIZE)
If you'd like to see the rest of the code, I posted it here: (https://github.com/Dbz/Battleship/blob/master/BattleShip.py); I didn't want to clutter the question with a lot of irrelevant code.
The ultimate naive solution wold be to go through every possible placement of ships (legal given what information is known) and counting the number of times each square is full.
obviously, in a relatively empty board this will not work as there are too many permutations, but a good start might be:
for each square on board: go through all ships and count in how many different ways it fits in that square, i.e. for each square of the ships length check if it fits horizontally and vertically.
an improvement might be to also check for each possible ship placement if the rest of the ships can be placed legally whilst covering all known 'hits' (places known to contain a ship).
to improve performance, if only one ship can be placed in a given spot, you no longer need to test it on other spots. also, when there are many 'hits', it might be quicker to first cover all known 'hits' and for each possible cover go through the rest.
edit: you might want to look into DFS.
Edit: Elaboration on OP's (#Dbz) suggestion in the comments:
hold a set of dismissed placements ('dissmissed') of ships (can be represented as string, say "4V5x3" for the placement of length 4 ship in 5x3, 5x4, 5x5, 5x6), after a guess you add all the placements the guess dismisses, then for each square hold a set of placements that intersect with it ('placements[x,y]') then the probability would be:
34-|intersection(placements[x,y], dissmissed)|/(3400-|dismissed|)
To add to the dismissed list:
if guess at (X,Y) is a miss add placements[x,y]
if guess at (X,Y) is a hit:
add neighboring placements (assuming that ships cannot be placed adjacently), i.e. add:
<(2,3a,3b,4,5)>H<X+1>x<Y>, <(2,3a,3b,4,5)>V<X>x<Y+1>
<(2,3a,3b,4,5)>H<X-(2,3,3,4,5)>x<Y>, <(2,3a,3b,4,5)>V<X>x<Y-(2,3,3,4,5)>
2H<X+-1>x<Y+(-2 to 1)>, 3aH<X+-1>x<Y+(-3 to 1)> ...
2V<X+(-2 to 1)>x<Y+-1>, 3aV<X+(-3 to 1)>x<Y+-1> ...
if |intersection(placements[x,y], dissmissed)|==33, i.e. only one placement possible add ship (see later)
check if any of the previews hits has only one possible placement left, if so, add the ship
check to see if any of the ships have only possible placement, if so, add the ship
adding a ship:
add all other placements of that ship to dismissed
for each (x,y) of the ships placement add placements[x,y] with out the actual placement
for each (x,y) of the ships placement mark as hit guess (if not already known) run stage 2
for each (x,y) neighboring the ships placement mark as miss guess (if not already known) run stage 1
run stage 3 and 4.
i might have over complicated this, there might be some redundant actions, but you get the point.
Nice question, and I like your idea for statistical approach.
I think I would have tried a machine learning approach for this problem as follows:
First model your problem as a classification problem.
The classification problem is: Given a square (x,y) - you want to tell the likelihood of having a ship in this square. Let this likelihood be p.
Next, you need to develop some 'features'. You can take the surrounding of (x,y) [as you might have partial knowledge on it] as your features.
For example, the features of the middle of the following mini-board (+ indicates the square you want to determine if there is a ship or not in):
OO*
O+*
?O?
can be something like:
f1 = (0,0) = false
f2 = (0,1) = false
f3 = (0,2) = true
f4 = (1,0) = false
**note skipping (1,1)
f5 = (1,2) = true
f6 = (2,0) = unknown
f7 = (2,1) = false
f8 = (2,2) = unknown
I'd implement features relative to the point of origin (in this case - (1,1)) and not as absolute location on board (so the square up to (3,3) will also be f2).
Now, create a training set. The training set is a 'labeled' set of features - based on some real boards. You can create it manually (create a lot of boards), automatically by a random generator of placements, or by some other data you can gather.
Feed the training set to a learning algorithm. The algorithm should be able to handle 'unknowns' and be able to give probability of "true" and not only a boolean answer. I think a variation of Naive Bayes can fit well here.
After you have got a classifier - exploit it with your AI.
When it's your turn, choose to fire upon a square which has the maximal value of p. At first, the shots will be kinda random - but with more shots you fire, you will have more information on the board, and the AI will exploit it for better predictions.
Note that I gave features based on a square of size 1. You can of course choose any k and find features on this bigger square - it will give you more features, but each might be less informative. There is no rule of thumb which will be better - and it should be tested.
Main question is, how are you going to find statistically probable locations. Are they already known or you want to figure them out?
Either case, I'd just make the grid weighed. In your case, the initial weight for each slot would be 1.0/(SIZE^2). The sum of weights must be equal to 1.
You can then adjust weights based on the statistics gathered from N last played games.
Now, when your AI makes a choice, it chooses a coordinate to hit based on weighed probabilities. The quick and simple way to do that would be:
Generate a random number R in range [0..1]
Start from slot (0, 0) adding the weights, i.e. S = W(0, 0) + W(0, 1) + .... where W(n, m) is the weight of the corresponding slot. Once S >= R, you've got the coordinate to hit.
This can be optimised by pre-calculating cumulative weights for each row, have fun :)
Find out which ships are still alive:
alive = [2,2,3,4] # length of alive ships
Find out spots where you have not shot, for example with a numpy.where()
Loop over spots where you can shoot.
Check the sides of the given position. Go left and right, how many spaces? Go up and down, how many spaces? If you can fit a boat in that many spaces, you can fit any smaller boat, so this loop I'd do it from the largest ship downwards, and I'd add to the counts in this position as many +1 as ships smaller than the one that fits.
Once you have done all of this, the position with more points should be the most probable to attack and hit something.
Of course, it can get as complicated as you want. You can also ask yourself, instead of which is my next hit, which combinations of hits will give me the victory in less number of hits or any other combination/parametrization of the problem. Good luck!

Categories