How to transform a OpenAI gym Box Space into a Discrete

How to transform a OpenAI gym Box Space into a Discrete - python

To use Q-learning I need to create a bi-dimensional NumPy array with size [nº of actions][nº of states].
My actions are a list of directions that I can easily convert to integers.
My state is a Box: gym.spaces.Box(low=np.int8(0), high=np.int8(2), shape=(5, 5), dtype=np.int8)
I need to find a way to take my state and convert it to an integer to be able to create the Q-table. I'm aware of wrappers but don't know how to use them and which one should be used.
The previous section is a quick summary, here's the problem in more detail:
I'm creating an OpenAi gym environment for the game Neutreeko. The states are the 8 directions and the environment is the Box (or a NumPy array). The creator of the game states:
There are 3,450,516 valid board positions. source
So I need to find a way to map each board to an ID to use as the index for the Q-table.
I asked the creator for a bit of aid on how he got to this number and he answered:
I am not entirely sure how I arrived at 3,450,515 (this would date back around 19 years). I have probably assumed that the Next player (N) does not already have three in a row, which gives them C(25, 3) - 48 = 2252 possible positions for their pieces, where C(n, k) denotes the binomial coefficient (n over k). The Previous player (P) should then have C(25 - 3, 3) = 1540 possible position for each of them, for a total of 2252 x 1540 = 3,468,080 positions. Most likely I have subtracted positions that are impossible to reach because there is no position from which the Previous player could have made a legal move and reached the current position.
Taking a simple example of 5 available positions and 3 pieces, there are 10 possible dispositions for the pieces, but I can't find a way to turn each disposition into an id using the available information (index of pieces allocated). Here's a try:

Related

"numpy.linspace" for second time after excluding some point by first "linspace"

I am building a model and I need to get the positions of some points inside a box (known volume). I am thinking on using
a) numpy.linspace(start,stop,30)
b) numpy.linspace(start,stop,3000)
from the same box, I think I need a tool to exclude the points from a) process.
Example as [2D]
say that we have a line of length 20, and we need to distribute two types of lines:
1)10 pieces of 1 length, 2) 4 pieces of 2 length.
-The space between piece(small line)from type 1 and any neighbors is equal whatever the neighbor is type 1 or 2.
-The number of small pieces are equally distributed around type 2 piece.

This solution is the only one that worked for me:
get the xyz file, by any other software like jmole.
you have the orientations of the model.
I wrote the orientations into my program to avoid overlapping.

Does
filtered_b = np.setdiff1d(np.linspace(start, stop, 3000), np.linspace(start, stop, 30))
This chooces points that are b that are not in a.

Find the first index for which an array goes below a certain threshold (and stay below for some time)

Let A be a 1D numpy array, a threshold t, and a window length K.
How to find the minimal index j, such that A[j:j+K] < t? (i.e. the first time A stays below the threshold on a full window of width K).
I've tried (unfinished) things with a loop, but it seemed far from optimal, and I thought maybe there's a clever "numpy way" to do it.
Sidenote: the fact we want to test if we're below the threshold during a certain window length instead of ponctual value, is useful to avoid on/off/on/off/on/off artefacts near the threshold (see also Hysteresis: "hysteresis is intentionally added to an electronic circuit to prevent unwanted rapid switching [...] compensate for contact bounce in switches, or noise in an electrical signal.").

Approach #1
We can use 1D convolution -
np.flatnonzero(np.convolve(A<t, np.ones(K,dtype=int))==K)[0]-K+1
The idea is to get the boolean array after comparison against threshold and then run a 1D convolution with a kernel of length same as window, filled with 1s. This gives us sum of each sliding window. So, all windows that have sum of K are the ones we are after. Use flatnonzero to get the starting indices for the valid windows. Finally, select the first one.
Approach #2
With binary-erosion -
from scipy.ndimage.morphology import binary_erosion
np.flatnonzero(binary_erosion(A<t, np.ones(K), origin=-(K//2)))[0]
This runs a sliding kernel with a length same as window and erodes away all windows that don't have window length True ones in sequence, leaving us with the valid ones. Again, use flatnonzero to get the indices and finally select the first one. We need to use the arg origin with the binary-erosion, so that we select the starts.
Approach #3
Here's another with island finding -
# Get mask of valid elements with comparison against thresh
mask = np.r_[False,A<t,False]
# Get indices of starts and ends for the valid islands
idx = np.flatnonzero(mask[:-1] != mask[1:])
start,stop = idx[::2],idx[1::2]
# Get the island lengths and check for lengths >=K and mask start indices
# and select the first one among them
out = start[(stop - start)>=K][0]

Pebbling a Checkerboard with Dynamic Programming

I am trying to teach myself Dynamic Programming, and ran into this problem from MIT.
We are given a checkerboard which has 4 rows and n columns, and
has an integer written in each square. We are also given a set of 2n pebbles, and we want to
place some or all of these on the checkerboard (each pebble can be placed on exactly one square)
so as to maximize the sum of the integers in the squares that are covered by pebbles. There is
one constraint: for a placement of pebbles to be legal, no two of them can be on horizontally or
vertically adjacent squares (diagonal adjacency is ok).
(a) Determine the number of legal patterns that can occur in any column (in isolation, ignoring
the pebbles in adjacent columns) and describe these patterns.
Call two patterns compatible if they can be placed on adjacent columns to form a legal placement.
Let us consider subproblems consisting of the rst k columns 1 k n. Each subproblem can
be assigned a type, which is the pattern occurring in the last column.
(b) Using the notions of compatibility and type, give an O(n)-time dynamic programming algorithm for computing an optimal placement.
Ok, so for part a: There are 8 possible solutions.
For part b, I'm unsure, but this is where I'm headed:
SPlit into sub-problems. Assume i in n.
1. Define Cj[i] to be the optimal value by pebbling columns 0,...,i, such that column i has pattern type j.
2. Create 8 separate arrays of n elements for each pattern type.
I am not sure where to go from here. I realize there are solutions to this problem online, but the solutions don't seem very clear to me.

You're on the right track. As you examine each new column, you will end up computing all possible best-scores up to that point.
Let's say you built your compatibility list (a 2D array) and called it Li[y] such that for each pattern i there are one or more compatible patterns Li[y].
Now, you examine column j. First, you compute that column's isolated scores for each pattern i. Call it Sj[i]. For each pattern i and compatible
pattern x = Li[y], you need to maximize the total score Cj such that Cj[x] = Cj-1[i] + Sj[x]. This is a simple array test and update (if bigger).
In addition, you store the pebbling pattern that led to each score. When you update Cj[x] (ie you increase its score from its present value) then remember the initial and subsequent patterns that caused the update as Pj[x] = i. That says "pattern x gave the best result, given the preceding pattern i".
When you are all done, just find the pattern i with the best score Cn[i]. You can then backtrack using Pj to recover the pebbling pattern from each column that led to this result.

How to generate statistically probably locations for ships in battleship

I made the original battleship and now I'm looking to upgrade my AI from random guessing to guessing statistically probably locations. I'm having trouble finding algorithms online, so my question is what kinds of algorithms already exist for this application? And how would I implement one?
Ships: 5, 4, 3, 3, 2
Field: 10X10
Board:
OCEAN = "O"
FIRE = "X"
HIT = "*"
SIZE = 10
SEA = [] # Blank Board
for x in range(SIZE):
SEA.append([OCEAN] * SIZE)
If you'd like to see the rest of the code, I posted it here: (https://github.com/Dbz/Battleship/blob/master/BattleShip.py); I didn't want to clutter the question with a lot of irrelevant code.

The ultimate naive solution wold be to go through every possible placement of ships (legal given what information is known) and counting the number of times each square is full.
obviously, in a relatively empty board this will not work as there are too many permutations, but a good start might be:
for each square on board: go through all ships and count in how many different ways it fits in that square, i.e. for each square of the ships length check if it fits horizontally and vertically.
an improvement might be to also check for each possible ship placement if the rest of the ships can be placed legally whilst covering all known 'hits' (places known to contain a ship).
to improve performance, if only one ship can be placed in a given spot, you no longer need to test it on other spots. also, when there are many 'hits', it might be quicker to first cover all known 'hits' and for each possible cover go through the rest.
edit: you might want to look into DFS.
Edit: Elaboration on OP's (#Dbz) suggestion in the comments:
hold a set of dismissed placements ('dissmissed') of ships (can be represented as string, say "4V5x3" for the placement of length 4 ship in 5x3, 5x4, 5x5, 5x6), after a guess you add all the placements the guess dismisses, then for each square hold a set of placements that intersect with it ('placements[x,y]') then the probability would be:
34-|intersection(placements[x,y], dissmissed)|/(3400-|dismissed|)
To add to the dismissed list:
if guess at (X,Y) is a miss add placements[x,y]
if guess at (X,Y) is a hit:
add neighboring placements (assuming that ships cannot be placed adjacently), i.e. add:
<(2,3a,3b,4,5)>H<X+1>x<Y>, <(2,3a,3b,4,5)>V<X>x<Y+1>
<(2,3a,3b,4,5)>H<X-(2,3,3,4,5)>x<Y>, <(2,3a,3b,4,5)>V<X>x<Y-(2,3,3,4,5)>
2H<X+-1>x<Y+(-2 to 1)>, 3aH<X+-1>x<Y+(-3 to 1)> ...
2V<X+(-2 to 1)>x<Y+-1>, 3aV<X+(-3 to 1)>x<Y+-1> ...
if |intersection(placements[x,y], dissmissed)|==33, i.e. only one placement possible add ship (see later)
check if any of the previews hits has only one possible placement left, if so, add the ship
check to see if any of the ships have only possible placement, if so, add the ship
adding a ship:
add all other placements of that ship to dismissed
for each (x,y) of the ships placement add placements[x,y] with out the actual placement
for each (x,y) of the ships placement mark as hit guess (if not already known) run stage 2
for each (x,y) neighboring the ships placement mark as miss guess (if not already known) run stage 1
run stage 3 and 4.
i might have over complicated this, there might be some redundant actions, but you get the point.

Nice question, and I like your idea for statistical approach.
I think I would have tried a machine learning approach for this problem as follows:
First model your problem as a classification problem.
The classification problem is: Given a square (x,y) - you want to tell the likelihood of having a ship in this square. Let this likelihood be p.
Next, you need to develop some 'features'. You can take the surrounding of (x,y) [as you might have partial knowledge on it] as your features.
For example, the features of the middle of the following mini-board (+ indicates the square you want to determine if there is a ship or not in):
OO*
O+*
?O?
can be something like:
f1 = (0,0) = false
f2 = (0,1) = false
f3 = (0,2) = true
f4 = (1,0) = false
**note skipping (1,1)
f5 = (1,2) = true
f6 = (2,0) = unknown
f7 = (2,1) = false
f8 = (2,2) = unknown
I'd implement features relative to the point of origin (in this case - (1,1)) and not as absolute location on board (so the square up to (3,3) will also be f2).
Now, create a training set. The training set is a 'labeled' set of features - based on some real boards. You can create it manually (create a lot of boards), automatically by a random generator of placements, or by some other data you can gather.
Feed the training set to a learning algorithm. The algorithm should be able to handle 'unknowns' and be able to give probability of "true" and not only a boolean answer. I think a variation of Naive Bayes can fit well here.
After you have got a classifier - exploit it with your AI.
When it's your turn, choose to fire upon a square which has the maximal value of p. At first, the shots will be kinda random - but with more shots you fire, you will have more information on the board, and the AI will exploit it for better predictions.
Note that I gave features based on a square of size 1. You can of course choose any k and find features on this bigger square - it will give you more features, but each might be less informative. There is no rule of thumb which will be better - and it should be tested.

Main question is, how are you going to find statistically probable locations. Are they already known or you want to figure them out?
Either case, I'd just make the grid weighed. In your case, the initial weight for each slot would be 1.0/(SIZE^2). The sum of weights must be equal to 1.
You can then adjust weights based on the statistics gathered from N last played games.
Now, when your AI makes a choice, it chooses a coordinate to hit based on weighed probabilities. The quick and simple way to do that would be:
Generate a random number R in range [0..1]
Start from slot (0, 0) adding the weights, i.e. S = W(0, 0) + W(0, 1) + .... where W(n, m) is the weight of the corresponding slot. Once S >= R, you've got the coordinate to hit.
This can be optimised by pre-calculating cumulative weights for each row, have fun :)

Find out which ships are still alive:
alive = [2,2,3,4] # length of alive ships
Find out spots where you have not shot, for example with a numpy.where()
Loop over spots where you can shoot.
Check the sides of the given position. Go left and right, how many spaces? Go up and down, how many spaces? If you can fit a boat in that many spaces, you can fit any smaller boat, so this loop I'd do it from the largest ship downwards, and I'd add to the counts in this position as many +1 as ships smaller than the one that fits.
Once you have done all of this, the position with more points should be the most probable to attack and hit something.
Of course, it can get as complicated as you want. You can also ask yourself, instead of which is my next hit, which combinations of hits will give me the victory in less number of hits or any other combination/parametrization of the problem. Good luck!

Modeling a graph in Python

I'm trying to solve a problem related to graphs in Python. Since its a comeptitive programming problem, I'm not using any other 3rd party packages.
The problem presents a graph in the form of a 5 X 5 square grid.
A bot is assumed to be at a user supplied position on the grid. The grid is indexed at (0,0) on the top left and (4,4) on the bottom right. Each cell in the grid is represented by any of the following 3 characters. ‘b’ (ascii value 98) indicates the bot’s current position, ‘d’ (ascii value 100) indicates a dirty cell and ‘-‘ (ascii value 45) indicates a clean cell in the grid.
For example below is a sample grid where the bot is at 0 0:
b---d
-d--d
--dd-
--d--
----d
The goal is to clean all the cells in the grid, in minimum number of steps.
A step is defined as a task, where either
i) The bot changes it position
ii) The bot changes the state of the cell (from d to -)
Assume that initially the position marked as b need not be cleaned. The bot is allowed to move UP, DOWN, LEFT and RIGHT.
My approach
I've read a couple of tutorials on graphs,and decided to model the graph as an adjacency matrix of 25 X 25 with 0 representing no paths, and 1 representing paths in the matrix (since we can move only in 4 directions). Next, I decided to apply Floyd Warshell's all pairs shortest path algorithm to it, and then sum up the values of the paths.
But I have a feeling that it won't work.
I'm in a delimma that the problem is either one of the following:
i) A Minimal Spanning Tree (which I'm unable to do, as I'm not able to model and store the grid as a graph).
ii) A* Search (Again a wild guess, but the same problem here, I'm not able to model the grid as a graph properly).
I'd be thankful if you could suggest a good approach at problems like these. Also, some hint and psuedocode about various forms of graph based problems (or links to those) would be helpful. Thank

I think you're asking two questions here.
1. How do I represent this problem as a graph in Python?
As the robot moves around, he'll be moving from one dirty square to another, sometimes passing through some clean spaces along the way. Your job is to figure out the order in which to visit the dirty squares.
# Code is untested and may contain typos. :-)
# A list of the (x, y) coordinates of all of the dirty squares.
dirty_squares = [(0, 4), (1, 1), etc.]
n = len(dirty_squares)
# Everywhere after here, refer to dirty squares by their index
# into dirty_squares.
def compute_distance(i, j):
return (abs(dirty_squares[i][0] - dirty_squares[j][0])
+ abs(dirty_squares[i][1] - dirty_squares[j][1]))
# distances[i][j] is the cost to move from dirty square i to
# dirty square j.
distances = []
for i in range(n):
distances.append([compute_distance(i, j) for j in range(n)])
# The x, y coordinates of where the robot starts.
start_node = (0, 0)
# first_move_distances[i] is the cost to move from the robot's
# start location to dirty square i.
first_move_distances = [
abs(start_node[0] - dirty_squares[i][0])
+ abs(start_node[1] - dirty_squares[i][1]))
for i in range(n)]
# order is a list of the dirty squares.
def cost(order):
if not order:
return 0 # Cleaning 0 dirty squares is free.
return (first_move_distances[order[0]]
+ sum(distances[order[i]][order[i+1]]
for i in range(len(order)-1)))
Your goal is to find a way to reorder list(range(n)) that minimizes the cost.
2. How do I find the minimum number of moves to solve this problem?
As others have pointed out, the generalized form of this problem is intractable (NP-Hard). You have two pieces of information that help constrain the problem to make it tractable:
The graph is a grid.
There are at most 24 dirty squares.
I like your instinct to use A* here. It's often good for solving find-the-minimum-number-of-moves problems. However, A* requires a fair amount of code. I think you'd be better of going with a Branch-and-Bound approach (sometimes called Branch-and-Prune), which should be almost as efficient but is much easier to implement.
The idea is to start enumerating all possible solutions using a depth-first-search, like so:
# Each list represents a sequence of dirty nodes.
[]
[1]
[1, 2]
[1, 2, 3]
[1, 3]
[1, 3, 2]
[2]
[2, 1]
[2, 1, 3]
Every time you're about to recurse into a branch, check to see if that branch is more expensive than the cheapest solution found so far. If so, you can skip the whole branch.
If that's not efficient enough, add a function to calculate a lower bound on the remaining cost. Then if cost([2]) + lower_bound(set([1, 3])) is more expensive than the cheapest solution found so far, you can skip the whole branch. The tighter lower_bound() is, the more branches you can skip.

Let's say V={v|v=b or v=d}, and get a full connected graph G(V,E). You could calculate the cost of each edge in E with a time complexity of O(n^2). Afterwards the problem becomes exactly the same as: Start at a specified vertex, and find a shortest path of G which covers V.
We call this Traveling Salesman Problem(TSP) since 1832.

The problem can certainly be stored as a graph. The cost between nodes (dirty cells) is their Manhattan distance. Ignore the cost of cleaning cells, because that total cost will be the same no matter what path taken.

This problem looks to me like the Minimum Rectilinear Steiner Tree problem. Unfortunately, the problem is NP hard, so you'll need to come up with an approximation (a Minimum Spanning Tree based on Manhattan distance), if I am correct.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.