How to structure a program to work with minesweeper configurations

How to structure a program to work with minesweeper configurations - python

EDIT: This was a while ago and I've since got it working, if you'd like to see the code it's included at github.com/LewisGaul/minegaulerQt.
I'm trying to write a program to calculate probabilities for the game minesweeper, and have had some difficulty working out how best to structure it. While it may seem quite simple at first with the example below, I would like to know the best way to allow for more complex configurations. Note I am not looking for help with how to calculate probabilities - I know the method, I just need to implement it!
To make it clear what I'm trying to calculate, I will work through a simple example which can be done by hand. Consider a minesweeper configuration
# # # #
# 1 2 #
# # # #
where # represents an unclicked cell. The 1 tells us there is exactly 1 mine in the leftmost 7 unclicked cells, the 2 tells us there are exactly 2 in the rightmost 7. To calculate the probability of each individual cell containing a mine, we need to determine all the different cases (only 2 in this simple case):
1 mine in leftmost 3 cells, 2 mines in rightmost 3 cells (total of 3 mines, 3x3=9 combinations).
1 mine in center 4 cells, 1 mine in rightmost 3 cells (total of 2 mines, 4x3=12 combinations).
Given the probability of a mine being in a random cell is about 0.2, it is (in a random selection of cells) about 4 times more likely there is a total of 2 mines rather than a total of 3, so the total number of mines in a configuration matters, as well as the number of combinations of each configuration. So in this case the probability of case 1 is 9/(9+4x12)=0.158, and the probability of there being a mine in a given leftmost cell is therefore about 0.158/3=0.05, as those cells are effectively equivalent (they share exactly the same revealed neighbours).
I have created a GUI with Tkinter which allows me to easily enter configurations such as the one in the example, which stores the grid as a numpy array. I then made a NumberGroup class which isolates each of the clicked/numbered cells, storing the number and a set of the coordinates of its unclicked neighbours. These can be subtracted to get equivalence groups... Although this would not be as straightforward if there were three or more numbers instead of just two. But I am unsure how to go from here to getting the different configurations. I toyed with making a Configuration class, but am not hugely familiar with how different classes should work together. See working code below (numpy required).
Note: I am aware I could have attempted to use a brute force approach, but if possible I would like to avoid that, keeping the equivalent groups separate (in the above example there are 3 equivalence groups, the leftmost 3, the middle 4, the rightmost 3). I would like to hear your thoughts on this.
import numpy as np
grid = np.array(
[[0, 0, 0, 0],
[0, 2, 1, 0],
[0, 0, 0, 0]]
)
dims = (3, 4) #Dimensions of the grid
class NumberGroup(object):
def __init__(self, mines, coords, dims=None):
"""Takes a number of mines, and a set of coordinates."""
if dims:
self.dims = dims
self.mines = mines
self.coords = coords
def __repr__(self):
return "<Group of {} cells with {} mines>".format(
len(self.coords), self.mines)
def __str__(self):
if hasattr(self, 'dims'):
dims = self.dims
else:
dims = (max([c[0] for c in self.coords]) + 1,
max([c[1] for c in self.coords]) + 1)
grid = np.zeros(dims, int)
for coord in self.coords:
grid[coord] = 1
return str(grid).replace('0', '.').replace('1', '#')
def __sub__(self, other):
if type(other) is NumberGroup:
return self.coords - other.coords
elif type(other) is set:
return self.coords - other.coords
else:
raise TypeError("Can only subtract a group or a set from another.")
def get_neighbours(coord, dims):
x, y = coord
row = [u for u in range(x-1, x+2) if u in range(dims[0])]
col = [v for v in range(y-1, y+2) if v in range(dims[1])]
return {(u, v) for u in row for v in col}
groups = []
all_coords = [(i, j) for i in range(dims[0])
for j in range(dims[1])]
for coord, nr in [(c, grid[c]) for c in all_coords if grid[c] > 0]:
empty_neighbours = {c for c in get_neighbours(coord, dims)
if grid[c] == 0}
if nr > len(empty_neighbours):
print "Error: number {} in cell {} is too high.".format(nr, coord)
break
groups.append(NumberGroup(nr, empty_neighbours, dims))
print groups
for g in groups:
print g
print groups[0] - groups[1]
UPDATE:
I have added a couple of other classes and restructured a bit (see below for working code), and it is now capable of creating and displaying the equivalence groups, which is a step in the right direction. However I still need to work out how to iterate through all the possible mine-configurations, by assigning a number of mines to each group in a way that creates a valid configuration. Any help is appreciated.
For example,
# # # #
# 2 1 #
# # # #
There are three equivalence groups G1: the left 3, G2: the middle 4, G3: the right 3. I want the code to loop through, assigning groups with mines in the following way:
G1=2 (max the first group) => G2=0 => G3=1 (this is all configs with G1=2)
G1=1 (decrease by one) => G2=1 => G3=0 (this is all with G1=1)
G1=0 => G2=2 INVALID
So we arrive at both configurations. This needs to work for more complicated setups!
import numpy as np
def get_neighbours(coord, dims):
x, y = coord
row = [u for u in range(x-1, x+2) if u in range(dims[0])]
col = [v for v in range(y-1, y+2) if v in range(dims[1])]
return {(u, v) for u in row for v in col}
class NrConfig(object):
def __init__(self, grid):
self.grid = grid
self.dims = grid.shape # Dimensions of grid
self.all_coords = [(i, j) for i in range(self.dims[0])
for j in range(self.dims[1])]
self.numbers = dict()
self.groups = []
self.configs = []
self.get_numbers()
self.get_groups()
self.get_configs()
def __str__(self):
return str(self.grid).replace('0', '.')
def get_numbers(self):
for coord, nr in [(c, self.grid[c]) for c in self.all_coords
if self.grid[c] > 0]:
empty_neighbours = {c for c in get_neighbours(
coord, self.dims) if self.grid[c] == 0}
if nr > len(empty_neighbours):
print "Error: number {} in cell {} is too high.".format(
nr, coord)
return
self.numbers[coord] = Number(nr, coord, empty_neighbours,
self.dims)
def get_groups(self):
coord_neighbours = dict()
for coord in [c for c in self.all_coords if self.grid[c] == 0]:
# Must be a set so that order doesn't matter!
coord_neighbours[coord] = {self.numbers[c] for c in
get_neighbours(coord, self.dims) if c in self.numbers}
while coord_neighbours:
coord, neighbours = coord_neighbours.popitem()
equiv_coords = [coord] + [c for c, ns in coord_neighbours.items()
if ns == neighbours]
for c in equiv_coords:
if c in coord_neighbours:
del(coord_neighbours[c])
self.groups.append(EquivGroup(equiv_coords, neighbours, self.dims))
def get_configs(self):
pass # WHAT GOES HERE?!
class Number(object):
"""Contains information about the group of cells around a number."""
def __init__(self, nr, coord, neighbours, dims):
"""Takes a number of mines, and a set of coordinates."""
self.nr = nr
self.coord = coord
# A list of the available neighbouring cells' coords.
self.neighbours = neighbours
self.dims = dims
def __repr__(self):
return "<Number {} with {} empty neighbours>".format(
int(self), len(self.neighbours))
def __str__(self):
grid = np.zeros(self.dims, int)
grid[self.coord] = int(self)
for coord in self.neighbours:
grid[coord] = 9
return str(grid).replace('0', '.').replace('9', '#')
def __int__(self):
return self.nr
class EquivGroup(object):
"""A group of cells which are effectively equivalent."""
def __init__(self, coords, nrs, dims):
self.coords = coords
# A list of the neighbouring Number objects.
self.nr_neighbours = nrs
self.dims = dims
if self.nr_neighbours:
self.max_mines = min(len(self.coords),
max(map(int, self.nr_neighbours)))
else:
self.max_mines = len(coords)
def __repr__(self):
return "<Equivalence group containing {} cells>".format(
len(self.coords))
def __str__(self):
grid = np.zeros(self.dims, int)
for coord in self.coords:
grid[coord] = 9
for number in self.nr_neighbours:
grid[number.coord] = int(number)
return str(grid).replace('0', '.').replace('9', '#')
grid = np.array(
[[0, 0, 0, 0],
[0, 2, 1, 0],
[0, 0, 0, 0]]
)
config = NrConfig(grid)
print config
print "Number groups:"
for n in config.numbers.values():
print n
print "Equivalence groups:"
for g in config.groups:
print g

If you don't want to brute-force it, you could model the process as a decision tree. Suppose we start with your example:
####
#21#
####
If we want to start placing mines in a valid configuration, we at this point essentially have eight choices. Since it doesn't really matter which square we pick within an equivalence group, we can narrow that down to three choices. The tree branches. Let's go down one branch:
*###
#11#
####
I placed a mine in G1, indicated by the asterisk. Also, I've updated the numbers (just one number in this case) associated with this equivalence group, to indicate that these numbered squares can now border one fewer mines.
This hasn't reduced our freedom of choice for the following step, we can still place a mine in any of the equivalence groups. Let's place another one in G1:
*XX#
*01#
XXX#
Another asterisk marks the new mine, and the numbered square has again been lowered by one. It has now reached zero, meaning it cannot border any more mines. That means that for our next choice of mine placement, all the equivalence groups dependent upon this numbered square are ruled out. Xs mark squares where we can now not place any mine. We can only make one choice now:
*XX*
*00X
XXXX
Here the branch ends and you've found a valid configuration. By running along all the branches in this tree in this manner, you should find all of them. Here we found your first configuration. Of course, there's more than one way to get there. If we had started by placing a mine in G3, we would have been forced to place the other two in G1. That branch leads to the same configuration, so you should check for duplicates. I don't see a way to avoid this redundancy right now.
The second configuration is found by either starting with G2, or placing one mine in G1 and then the second in G2. In either case you again end up at a branch end:
**XX
X00X
XXXX
Invalid configurations like your example with zero mines in G1 do not pop up. There are no valid choices along the tree that lead you there. Here is the whole tree of valid choices.
Choice 1: 1 | 2 | 3
Choice 2: 1 2 3 | 1 | 1
Choice 3: 3 1 | |1
Valid configurations are the branch ends at which no further choice is possible, i.e.
113
12
131
21
311
which obviously fall into two equivalent classes if we disregard the order of the numbers.

Related

Spawning objects in groups when the first object of the group was spawned randomly Python

I'm currently doing a project, and in the code I have, I'm trying to get trees .*. and mountains .^. to spawn in groups around the first tree or mountain which is spawned randomly, however, I can't figure out how to get the trees and mountains to spawn in groups around a single randomly generated point. Any help?
grid = []
def draw_board():
row = 0
for i in range(0,625):
if grid[i] == 1:
print("..."),
elif grid[i] == 2:
print("..."),
elif grid[i] == 3:
print(".*."),
elif grid[i] == 4:
print(".^."),
elif grid[i] == 5:
print("[T]"),
else:
print("ERR"),
row = row + 1
if row == 25:
print ("\n")
row = 0
return

There's a number of ways you can do it.
Firstly, you can just simulate the groups directly, i.e. pick a range on the grid and fill it with a specific figure.
def generate_grid(size):
grid = [0] * size
right = 0
while right < size:
left = right
repeat = min(random.randint(1, 5), size - right) # *
right = left + repeat
grid[left:right] = [random.choice(figures)] * repeat
return grid
Note that the group size need not to be uniformly distributed, you can use any convenient distribution, e.g. Poisson.
Secondly, you can use a Markov Chain. In this case group lengths will implicitly follow a Geometric distribution. Here's the code:
def transition_matrix(A):
"""Ensures that each row of transition matrix sums to 1."""
copy = []
for i, row in enumerate(A):
total = sum(row)
copy.append([item / total for item in row])
return copy
def generate_grid(size):
# Transition matrix ``A`` defines the probability of
# changing from figure i to figure j for each pair
# of figures i and j. The grouping effect can be
# obtained by setting diagonal entries A[i][i] to
# larger values.
#
# You need to specify this manually.
A = transition_matrix([[5, 1],
[1, 5]]) # Assuming 2 figures.
grid = [random.choice(figures)]
for i in range(1, size):
current = grid[-1]
next = choice(figures, A[current])
grid.append(next)
return grid
Where the choice function is explained in this StackOverflow answer.

8 puzzle using blind search (brute-force) and manhattan distance heuristic

I developed my own program in Python for solving 8-puzzle. Initially I used "blind" or uninformed search (basically brute-forcing) generating and exploring all possible successors and using breadth-first search. When it finds the "goal" state, it basically back-tracks to the initial state and delivers (what I believe) is the most optimized steps to solve it. Of course, there were initial states where the search would take a lot of time and generate over 100,000 states before finding the goal.
Then I added the heuristic - Manhattan Distance. The solutions started coming exponentially quickly and with lot less explored states. But my confusion is that some of the times, the optimized sequence generated was longer than the one reached using blind or uninformed search.
What I am doing is basically this:
For each state, look for all possible moves (up, down, left and right), and generate the successor states.
Check if state is repeat. If yes, then ignore it.
Calculate Manhattan for the state.
Pick out the successor(s) with lowest Manhattan and add at the end of the list.
Check if goal state. If yes, break the loop.
I am not sure whether this would qualify as greedy-first, or A*.
My question is, is this an inherent flaw in the Manhattan Distance Heuristic that sometimes it would not give the most optimal solution or am i doing something wrong.
Below is the code. I apologize that it is not a very clean code but being mostly sequential it should be simple to understand. I also apologize for a long code - I know I need to optimize it. Would also appreciate any suggestions/guidance for cleaning up the code. Here is what it is:
import numpy as np
from copy import deepcopy
import sys
# calculate Manhattan distance for each digit as per goal
def mhd(s, g):
m = abs(s // 3 - g // 3) + abs(s % 3 - g % 3)
return sum(m[1:])
# assign each digit the coordinate to calculate Manhattan distance
def coor(s):
c = np.array(range(9))
for x, y in enumerate(s):
c[y] = x
return c
#################################################
def main():
goal = np.array( [1, 2, 3, 4, 5, 6, 7, 8, 0] )
rel = np.array([-1])
mov = np.array([' '])
string = '102468735'
inf = 'B'
pos = 0
yes = 0
goalc = coor(goal)
puzzle = np.array([int(k) for k in string]).reshape(1, 9)
rnk = np.array([mhd(coor(puzzle[0]), goalc)])
while True:
loc = np.where(puzzle[pos] == 0) # locate '0' (blank) on the board
loc = int(loc[0])
child = np.array([], int).reshape(-1, 9)
cmove = []
crank = []
# generate successors on possible moves - new states no repeats
if loc > 2: # if 'up' move is possible
succ = deepcopy(puzzle[pos])
succ[loc], succ[loc - 3] = succ[loc - 3], succ[loc]
if ~(np.all(puzzle == succ, 1)).any(): # repeat state?
child = np.append(child, [succ], 0)
cmove.append('up')
crank.append(mhd(coor(succ), goalc)) # manhattan distance
if loc < 6: # if 'down' move is possible
succ = deepcopy(puzzle[pos])
succ[loc], succ[loc + 3] = succ[loc + 3], succ[loc]
if ~(np.all(puzzle == succ, 1)).any(): # repeat state?
child = np.append(child, [succ], 0)
cmove.append('down')
crank.append(mhd(coor(succ), goalc))
if loc % 3 != 0: # if 'left' move is possible
succ = deepcopy(puzzle[pos])
succ[loc], succ[loc - 1] = succ[loc - 1], succ[loc]
if ~(np.all(puzzle == succ, 1)).any(): # repeat state?
child = np.append(child, [succ], 0)
cmove.append('left')
crank.append(mhd(coor(succ), goalc))
if loc % 3 != 2: # if 'right' move is possible
succ = deepcopy(puzzle[pos])
succ[loc], succ[loc + 1] = succ[loc + 1], succ[loc]
if ~(np.all(puzzle == succ, 1)).any(): # repeat state?
child = np.append(child, [succ], 0)
cmove.append('right')
crank.append(mhd(coor(succ), goalc))
for s in range(len(child)):
if (inf in 'Ii' and crank[s] == min(crank)) \
or (inf in 'Bb'):
puzzle = np.append(puzzle, [child[s]], 0)
rel = np.append(rel, pos)
mov = np.append(mov, cmove[s])
rnk = np.append(rnk, crank[s])
if np.array_equal(child[s], goal):
print()
print('Goal achieved!. Successors generated:', len(puzzle) - 1)
yes = 1
break
if yes == 1:
break
pos += 1
# generate optimized steps by back-tracking the steps to the initial state
optimal = np.array([], int).reshape(-1, 9)
last = len(puzzle) - 1
optmov = []
rank = []
while last != -1:
optimal = np.insert(optimal, 0, puzzle[last], 0)
optmov.insert(0, mov[last])
rank.insert(0, rnk[last])
last = int(rel[last])
# show optimized steps
optimal = optimal.reshape(-1, 3, 3)
print('Total optimized steps:', len(optimal) - 1)
print()
for s in range(len(optimal)):
print('Move:', optmov[s])
print(optimal[s])
print('Manhattan Distance:', rank[s])
print()
print()
################################################################
# Main Program
if __name__ == '__main__':
main()
Here are some of the initial states and the optimized steps calculated if you would like to check (above code would give this option to choose between blind vs Informed search)
Initial states
- 283164507 Blind: 19 Manhattan: 21
- 243780615 Blind: 15 Manhattan: 21
- 102468735 Blind: 11 Manhattan: 17
- 481520763 Blind: 13 Manhattan: 23
- 723156480 Blind: 16 Manhattan: 20
I have deliberately chosen examples where results would be quick (within seconds or few minutes).
Your help and guidance would be much appreciated.
Edit: I have made some quick changes and managed to reduce some 30+ lines. Unfortunately can't do much at this time.
Note: I have hardcoded the initial state and the blind vs informed choice. Please change the value of variable "string" for initial state and the variable "inf" [I/B] for Informed/Blind. Thanks!

Referencing a conditional random element of an array and replacing it

This is my second question post on StackOverflow relating to coding in Python/Numpy.
I feel like there is definitely some sort of function which does the pseudocode:
np.random.choice([a[i-1,j],a[i+1,j],a[i,j-1],a[i,j+1]])==0 = 9
Essentially, I would like the random function to select a cell adjacent to mine (up, down, left, right) with the value 0, and replace said cell with a 9
Unforunately, I know why the code I typed is illegal. The first half of the statement returns a True/False boolean as I have used a comparison/checking operator. I can't set this into a value 9.
If I split the code-load into two codes and used an if statement with the random.choice (looking at an adjacent element that equalled zero), then following this, I would need some sort of function or definition to recall which cell (up down left or right) did the random generator originally select, to which I can then set it to 9.
Kind Regards,
EDIT: I may as well attach a sample code, so you can simply just run this (I am including my error)
a = np.empty((6,6,))
a[:] = 0
a[2,3]=a[3,3]=a[2,4] = 1
for (i,j), value in np.ndenumerate(a):
if a[i,j]==1:
np.random.choice([a[i-1,j],a[i+1,j],a[i,j-1],a[i,j+1]])==0 = 9

You could select from a range of directions (up, down, left, right) that map to specific coordinate movements in the 2D array, like this:
# generate a dataset
a = np.zeros((6,6))
a[2,3]=a[3,3]=a[2,4] = 1
# map directions to coordinate movements
nesw_map = {'left': [-1, 0], 'top': [0, 1], 'right': [1,0], 'bottom': [0,-1]}
directions = nesw_map.keys()
# select only those places where a == 1
for col_ind, row_ind in zip(*np.where(a == 1)): # more efficient than iterating over the entire array
x = np.random.choice(directions)
elm_coords = col_ind + nesw_map[x][0], row_ind + nesw_map[x][1]
if a[elm_coords] == 0:
a[elm_coords] = 9
Note that this does not do any type of bounds checking (so if a 1 appears at the edge, you might select an item "off the grid" which will result in an error).

This is the most "basic" way of getting what you need (Adding a try/except statement provides error checking, so you can prevent any unwanted errors):
import random,numpy
a = numpy.empty((6,6,))
a[:] = 0
a[2,3]=a[3,3]=a[5,5] = 1
for (i,j), value in numpy.ndenumerate(a):
var = 0
if a[i,j]==1:
while var==0:
x=random.randrange(0,4) #Generate a random number
try:
if x==0 and a[i-1,j]==0:
a[i-1,j] =9 #Do this if x = 0
elif x==1 and a[i+1,j]==0:
a[i+1,j] =9 #Do this if x = 1
elif x==2 and a[i,j-1]==0:
a[i,j-1] =9 #Do this if x = 2
elif x==3 and a[i,j+1]==0:
a[i,j+1] =9 #Do this if x = 3
var=1
except:
var=0
print a

Algorithm for matching objects

I have 1,000 objects, each object has 4 attribute lists: a list of words, images, audio files and video files.
I want to compare each object against:
a single object, Ox, from the 1,000.
every other object.
A comparison will be something like:
sum(words in common+ images in common+...).
I want an algorithm that will help me find the closest 5, say, objects to Ox and (a different?) algorithm to find the closest 5 pairs of objects
I've looked into cluster analysis and maximal matching and they don't seem to exactly fit this scenario. I don't want to use these method if something more apt exists, so does this look like a particular type of algorithm to anyone, or can anyone point me in the right direction to applying the algorithms I mentioned to this?

I made an example program for how to solve your first question. But you have to implement ho you want to compare images, audio and videos. And I assume every object has the same length for all lists. To answer your question number two it would be something similar, but with a double loop.
import numpy as np
from random import randint
class Thing:
def __init__(self, words, images, audios, videos):
self.words = words
self.images = images
self.audios = audios
self.videos = videos
def compare(self, other):
score = 0
# Assuming the attribute lists have the same length for both objects
# and that they are sorted in the same manner:
for i in range(len(self.words)):
if self.words[i] == other.words[i]:
score += 1
for i in range(len(self.images)):
if self.images[i] == other.images[i]:
score += 1
# And so one for audio and video. You have to make sure you know
# what method to use for determining when an image/audio/video are
# equal.
return score
N = 1000
things = []
words = np.random.randint(5, size=(N,5))
images = np.random.randint(5, size=(N,5))
audios = np.random.randint(5, size=(N,5))
videos = np.random.randint(5, size=(N,5))
# For testing purposes I assign each attribute to a list (array) containing
# five random integers. I don't know how you actually intend to do it.
for i in xrange(N):
things.append(Thing(words[i], images[i], audios[i], videos[i]))
# I will assume that object number 999 (i=999) is the Ox:
ox = 999
scores = np.zeros(N - 1)
for i in xrange(N - 1):
scores[i] = (things[ox].compare(things[i]))
best = np.argmax(scores)
print "The most similar thing is thing number %d." % best
print
print "Ox attributes:"
print things[ox].words
print things[ox].images
print things[ox].audios
print things[ox].videos
print
print "Best match attributes:"
print things[ox].words
print things[ox].images
print things[ox].audios
print things[ox].videos
EDIT:
Now here is the same program modified sligthly to answer your second question. It turned out to be very simple. I basically just needed to add 4 lines:
Changing scores into a (N,N) array instead of just (N).
Adding for j in xrange(N): and thus creating a double loop.
if i == j:
break
where 3. and 4. is just to make sure that I only compare each pair of things once and not twice and don't compary any things with themselves.
Then there is a few more lines of code that is needed to extract the indices of the 5 largest values in scores. I also reformated the printing so it will be easy to confirm by eye that the printed pairs are actually very similar.
Here comes the new code:
import numpy as np
class Thing:
def __init__(self, words, images, audios, videos):
self.words = words
self.images = images
self.audios = audios
self.videos = videos
def compare(self, other):
score = 0
# Assuming the attribute lists have the same length for both objects
# and that they are sorted in the same manner:
for i in range(len(self.words)):
if self.words[i] == other.words[i]:
score += 1
for i in range(len(self.images)):
if self.images[i] == other.images[i]:
score += 1
for i in range(len(self.audios)):
if self.audios[i] == other.audios[i]:
score += 1
for i in range(len(self.videos)):
if self.videos[i] == other.videos[i]:
score += 1
# You have to make sure you know what method to use for determining
# when an image/audio/video are equal.
return score
N = 1000
things = []
words = np.random.randint(5, size=(N,5))
images = np.random.randint(5, size=(N,5))
audios = np.random.randint(5, size=(N,5))
videos = np.random.randint(5, size=(N,5))
# For testing purposes I assign each attribute to a list (array) containing
# five random integers. I don't know how you actually intend to do it.
for i in xrange(N):
things.append(Thing(words[i], images[i], audios[i], videos[i]))
################################################################################
############################# This is the new part: ############################
################################################################################
scores = np.zeros((N, N))
# Scores will become a triangular matrix where scores[i, j]=value means that
# value is the number of attrributes thing[i] and thing[j] have in common.
for i in xrange(N):
for j in xrange(N):
if i == j:
break
# Break the loop here because:
# * When i==j we would compare thing[i] with itself, and we don't
# want that.
# * For every combination where j>i we would repeat all the
# comparisons for j<i and create duplicates. We don't want that.
scores[i, j] = (things[i].compare(things[j]))
# I want the 5 most similar pairs:
n = 5
# This list will contain a tuple for each of the n most similar pairs:
best_list = []
for k in xrange(n):
ij = np.argmax(scores) # Returns a single integer: ij = i*n + j
i = ij / N
j = ij % N
best_list.append((i, j))
# Erease this score so that on next iteration the second largest score
# is found:
scores[i, j] = 0
for k, (i, j) in enumerate(best_list):
# The number 1 most similar pair is the BEST match of all.
# The number N most similar pair is the WORST match of all.
print "The number %d most similar pair is thing number %d and %d." \
% (k+1, i, j)
print "Thing%4d:" % i, \
things[i].words, things[i].images, things[i].audios, things[i].videos
print "Thing%4d:" % j, \
things[j].words, things[j].images, things[j].audios, things[j].videos
print

If your comparison works with "create a sum of all features and find those which the closest sum", there is a simple trick to get close objects:
Put all objects into an array
Calculate all the sums
Sort the array by sum.
If you take any index, the objects close to it will now have a close index as well. So to find the 5 closest objects, you just need to look at index+5 to index-5 in the sorted array.

Python 3.3.2 - 'Grouping' System with Characters

I have a fun little problem.
I need to count the amount of 'groups' of characters in a file. Say the file is...
..##.#..#
##..####.
.........
###.###..
##...#...
The code will then count the amount of groups of #'s. For example, the above would be 3. It includes diagonals. Here is my code so far:
build = []
height = 0
with open('file.txt') as i:
build.append(i)
height += 1
length = len(build[0])
dirs = {'up':(-1, 0), 'down':(1, 0), 'left':(0, -1), 'right':(0, 1), 'upleft':(-1, -1), 'upright':(-1, 1), 'downleft':(1, -1), 'downright':(1, 1)}
def find_patches(grid, length):
queue = []
queue.append((0, 0))
patches = 0
while queue:
current = queue.pop(0)
line, cell = path[-1]
if ## This is where I am at. I was making a pathfinding system.

Here’s a naive solution I came up with. Originally I just wanted to loop through all the elements once an check for each, if I can put it into an existing group. That didn’t work however as some groups are only combined later (e.g. the first # in the second row would not belong to the big group until the second # in that row is processed). So I started working on a merge algorithm and then figured I could just do that from the beginning.
So how this works now is that I put every # into its own group. Then I keep looking at combinations of two groups and check if they are close enough to each other that they belong to the same group. If that’s the case, I merge them and restart the check. If I completely looked at all possible combinations and could not merge any more, I know that I’m done.
from itertools import combinations, product
def canMerge (g, h):
for i, j in g:
for x, y in h:
if abs(i - x) <= 1 and abs(j - y) <= 1:
return True
return False
def findGroups (field):
# initialize one-element groups
groups = [[(i, j)] for i, j in product(range(len(field)), range(len(field[0]))) if field[i][j] == '#']
# keep joining until no more joins can be executed
merged = True
while merged:
merged = False
for g, h in combinations(groups, 2):
if canMerge(g, h):
g.extend(h)
groups.remove(h)
merged = True
break
return groups
# intialize field
field = '''\
..##.#..#
##..####.
.........
###.###..
##...#...'''.splitlines()
groups = findGroups(field)
print(len(groups)) # 3

I'm not exactly sure what your code is trying to do. Your with statement opens a file, but all you do is append the file object to a list before the with ends and it gets closed (without its contents ever being read). I suspect his is not what you intend, but I'm not sure what you were aiming for.
If I understand your problem correctly, you are trying to count the connected components of a graph. In this case, the graph's vertices are the '#' characters, and the edges are wherever such characters are adjacent to each other in any direction (horizontally, vertically or diagonally).
There are pretty simple algorithms for solving that problem. One is to use a disjoint set data structure (also known as a "union-find" structure, since union and find are the two operations it supports) to connect groups of '#' characters together as they're read in from the file.
Here's a fairly minimal disjoint set I wrote to answer another question a while ago:
class UnionFind:
def __init__(self):
self.rank = {}
self.parent = {}
def find(self, element):
if element not in self.parent: # leader elements are not in `parent` dict
return element
leader = self.find(self.parent[element]) # search recursively
self.parent[element] = leader # compress path by saving leader as parent
return leader
def union(self, leader1, leader2):
rank1 = self.rank.get(leader1,1)
rank2 = self.rank.get(leader2,1)
if rank1 > rank2: # union by rank
self.parent[leader2] = leader1
elif rank2 > rank1:
self.parent[leader1] = leader2
else: # ranks are equal
self.parent[leader2] = leader1 # favor leader1 arbitrarily
self.rank[leader1] = rank1+1 # increment rank
And here's how you can use it for your problem, using x, y tuples for the nodes:
nodes = set()
groups = UnionFind()
with open('file.txt') as f:
for y, line in enumerate(f): # iterate over lines
for x, char in enumerate(line): # and characters within a line
if char == '#':
nodes.add((x, y)) # maintain a set of node coordinates
# check for neighbors that have already been read
neighbors = [(x-1, y-1), # up-left
(x, y-1), # up
(x+1, y-1), # up-right
(x-1, y)] # left
for neighbor in neighbors:
if neighbor in nodes:
my_group = groups.find((x, y))
neighbor_group = groups.find(neighbor)
if my_group != neighbor_group:
groups.union(my_group, neighbor_group)
# finally, count the number of unique groups
number_of_groups = len(set(groups.find(n) for n in nodes))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.