I'm trying to match up the elements in 2 different arrays. Array_A is a 3d map of A_Clouds, Array_B is a 3d map of B_Clouds. Each "cloud" is continuous, i.e. any isolated pixels would define a new cloud. The values of the pixels are a single, unique integer for each cloud. Non-cloud values are 0. Here's a 2D example:
[[0 0 0 0 0 0 0 0 0]
[0 0 0 1 1 1 0 0 0]
[0 0 1 1 1 1 1 1 0]
[0 0 0 1 1 1 1 1 0]
[0 0 0 0 0 1 0 0 0]
[0 0 0 0 0 0 0 0 0]]
The output I need is simply the IDs (for both clouds) of each A_Cloud which is overlapping with a B_Cloud, and the number (locations not needed) of pixels which are overlapping between those clouds.
The problem is that these are both very large 3 dimensional arrays (~2000x2000x200, both are the same size). I'm basically doing a bunch of nested for loops, which is of course very slow. Is there a faster way that I could approach this problem? Thanks in advance.
This is what I have right now (simplified to 2d):
final_matches = []
for Acloud_id in ACloud_list:
Acloud_locs = list(set([(i,j) for j, line in enumerate(Array_A) for i,pix in enumerate(line) if pix == Acloud_id]))
matches = []
for loc in Acloud_locs:
Bcloud_pix = Array_B[loc[0]][loc[1]]
if Bcloud_pix:
matches.append(Bcloud_pix)
counter=collections.Counter(matches)
final_matches.append([Acloud_id, counter])
Thanks in advance!
Some considerations here:
for Acloud_id in ACloud_list:
Acloud_locs = list(set([(i,j) for j, line in enumerate(Array_A) for i,pix in enumerate(line) if pix == Acloud_id]))
If I've read that right, this needs to check every pixel in the array in order to generate the set, and it repeats that for every cloud in A. So if you have 500 clouds, you're checking every pixel 500 times. This is not going to scale well!
Might be more efficient to store the overlap counts in a dict, and just go through the arrays once:
overlaps=dict()
for i in possible_x_coords: # define these however you like
for j in possible_y_coords:
if (Array_A[i][j] and Array_B[i][j]):
overlaps[(Array_A[i][j],Array_B[i][j])] = 1 + overlaps.get((Array_A[i][j],Array_B[i][j]),0)
(apologies for any errors, I'm on the road and can't test my code)
update: You've clarified that the arrays are about 80% sparse. If that figure was a lot higher, and if you had control over the format of your inputs, I'd suggest looking into sparse array formats - if your input only stores the non-zero values for A, this can save you the trouble of checking for zero values in A. However, for something that's only 80% sparse, I'm not sure how much efficiency this would add.
big_array = np.array((
[0,1,0,0,1,0,0,1],
[0,1,0,0,0,0,0,0],
[0,1,0,0,1,0,0,0],
[0,0,0,0,1,0,0,0],
[1,0,0,0,1,0,0,0]))
print(big_array)
[[0 1 0 0 1 0 0 1]
[0 1 0 0 0 0 0 0]
[0 1 0 0 1 0 0 0]
[0 0 0 0 1 0 0 0]
[1 0 0 0 1 0 0 0]]
Is there a way to iterate over this numpy array and for each 2x2 cluster of 0s, set all values within that cluster = 5? This is what the output would look like.
[[0 1 5 5 1 5 5 1]
[0 1 5 5 0 5 5 0]
[0 1 5 5 1 5 5 0]
[0 0 5 5 1 5 5 0]
[1 0 5 5 1 5 5 0]]
My thoughts are to use advanced indexing to set the 2x2 shape = to 5, but I think it would be really slow to simply iterate like:
1) check if array[x][y] is 0
2) check if adjacent array elements are 0
3) if all elements are 0, set all those values to 5.
big_array = [1, 7, 0, 0, 3]
i = 0
p = 0
while i <= len(big_array) - 1 and p <= len(big_array) - 2:
if big_array[i] == big_array[p + 1]:
big_array[i] = 5
big_array[p + 1] = 5
print(big_array)
i = i + 1
p = p + 1
Output:
[1, 7, 5, 5, 3]
It is a example, not whole correct code.
Here's a solution by viewing the array as blocks.
First you need to define this function rolling_window from here https://gist.github.com/seberg/3866040/revisions
Then break the array big, your starting array, into 2x2 blocks using this function.
Also generate an array which has indices of every element in big and break it similarly into 2x2 blocks.
Then generate a boolean mask where the 2x2 blocks of big are all zero, and use the index array to get those elements.
blks = rolling_window(big,window=(2,2)) # 2x2 blocks of original array
inds = np.indices(big.shape).transpose(1,2,0) # array of indices into big
blkinds = rolling_window(inds,window=(2,2,0)).transpose(0,1,4,3,2) # 2x2 blocks of indices into big
mask = blks == np.zeros((2,2)) # generate a mask of every 2x2 block which is all zero
mask = mask.reshape(*mask.shape[:-2],-1).all(-1) # still generating the mask
# now blks[mask] is every block which is zero..
# but you actually want the original indices in the array 'big' instead
inds = blkinds[mask].reshape(-1,2).T # indices into big where elements need replacing
big[inds[0],inds[1]] = 5 #reassign
You need to test this: I did not. But the idea is to break the array into blocks, and an array of indices into blocks, then develop a boolean condition on the blocks, use those to get the indices, and then reassign.
An alternative would be to iterate through indblks as defined here, then test the 2x2 obtained from big at each indblk element and reassign if necessary.
This is my attempt to help you solve your problem. My solution may be subject to fair criticism.
import numpy as np
from itertools import product
m = np.array((
[0,1,0,0,1,0,0,1],
[0,1,0,0,0,0,0,0],
[0,1,0,0,1,0,0,0],
[0,0,0,0,1,0,0,0],
[1,0,0,0,1,0,0,0]))
h = 2
w = 2
rr, cc = tuple(d + 1 - q for d, q in zip(m.shape, (h, w)))
slices = [(slice(r, r + h), slice(c, c + w))
for r, c in product(range(rr), range(cc))
if not m[r:r + h, c:c + w].any()]
for s in slices:
m[s] = 5
print(m)
[[0 1 5 5 1 5 5 1]
[0 1 5 5 0 5 5 5]
[0 1 5 5 1 5 5 5]
[0 5 5 5 1 5 5 5]
[1 5 5 5 1 5 5 5]]
I have a numpy array of size 2000*4000 with binary values in it. I have a template that I would like to match with my source array. Currently I'm running a sliding window over the source array. Although this method works fine, it's very time consuming. I'm guessing a native implementation would be much faster. Also can it match multiple occurrences of the template?
Something like
x = [[0 0 1 1 1 1 1 0 0]
[0 0 1 1 0 0 0 0 0]
[0 0 1 1 0 0 0 0 0]
[0 0 1 1 0 0 0 0 0]]
template = [[1 1 1 1]
[1 0 0 0]
[1 0 0 0]]
cords = np.matchtemplate(x, template)
And printing the cords should ideally give a list of tuples which has the diagonal coordinates of the matching segment.
print(cords)
[[(0, 3), (6, 2)]]
A solution which uses OpenCV:
import cv2
result = cv2.matchTemplate(
x.astype(np.float32),
template.astype(np.float32),
cv2.TM_SQDIFF)
positions = np.argwhere(result == 0.0)
This gives (0, 3) for your example.
As #MPA suggested, this will provide with a list of candidates:
from scipy import signal
match = np.sum(template)
tst = signal.convolve2d(x, template[::-1, ::-1], mode='valid') == match
candidates = np.argwhere(tst)
This gives (0, 2) and (0, 3) for your example.
For binary matrices, one can do as #Paul suggests:
from scipy import signal
match = np.sum(template)
tst = signal.convolve2d(x, (2 * template - 1)[::-1, ::-1], mode='valid') == match
positions = np.argwhere(tst)
This gives (0, 3) for your example.
I need help creating a list for each of the 9 3x3 blocks in sudoku. so I have a list of lists representing the original sudoku board (zero means empty):
board=[[2,0,0,0,0,0,0,6,0],
[0,0,0,0,7,5,0,3,0],
[0,4,8,0,9,0,1,0,0],
[0,0,0,3,0,0,0,0,0],
[3,0,0,0,1,0,0,0,9],
[0,0,0,0,0,8,0,0,0],
[0,0,1,0,2,0,5,7,0],
[0,8,0,7,3,0,0,0,0],
[0,9,0,0,0,0,0,0,4]]
I need to turn these into a list of lists containing the 3x3 blocks. So for example:
[[2,0,0,0,0,0,0,4,8],[etc]]
i tried creating one list called "blocks" containing 9 other lists with just zeroes in each list. so it looked like:
blocks=[[0,0,0,0,0,0,0,0,0],[etc]
then i used a while loop to change the values in the list:
BLOCK_COUNT=0
BOARD_COUNT=0
while BLOCK_COUNT<len(blocks):
blocks[BLOCK_COUNT][0]=board[BOARD_COUNT][BOARD_COUNT]
blocks[BLOCK_COUNT][1]=board[BOARD_COUNT][BOARD_COUNT+1]
blocks[BLOCK_COUNT][2]=board[BOARD_COUNT][BOARD_COUNT+2]
blocks[BLOCK_COUNT][3]=board[BOARD_COUNT+1][BOARD_COUNT]
blocks[BLOCK_COUNT][4]=board[BOARD_COUNT+1][BOARD_COUNT+1]
blocks[BLOCK_COUNT][5]=board[BOARD_COUNT+1][BOARD_COUNT+2]
blocks[BLOCK_COUNT][6]=board[BOARD_COUNT+2][BOARD_COUNT]
blocks[BLOCK_COUNT][7]=board[BOARD_COUNT+2][BOARD_COUNT+1]
blocks[BLOCK_COUNT][8]=board[BOARD_COUNT+2][BOARD_COUNT+2]
BLOCK_COUNT+=1
BOARD_COUNT+=3
This however gives me an index error. if I create 2 of those while loops with "BLOCK_COUNT" being 3 and 6 respectively then i get a better answer but it still doesn't give me the correct 3x3 block for some. So i'm pretty much at a loss for how to do this. Thanks.
def getBlocks(board):
answer = []
for r,c in itertools.product(range(3), repeat=2):
answer.append([board[r+i][c+j] for i,j in itertools.product(range(0, 9, 3), repeat=2)])
return answer
Of course, you could replace the whole thing with just one list comprehension:
answer = [[board[r+i][c+j] for i,j in itertools.product(range(0, 9, 3), repeat=2)]
for r,c in itertools.product(range(3), repeat=2)]
In case you are interested in a version that doesn't use any built-ins to do any heavy lifting:
def getBlocks(board):
answer = []
for r in range(3):
for c in range(3):
block = []
for i in range(3):
for j in range(3):
block.append(board[3*r + i][3*c + j])
answer.append(block)
return answer
So what's happening here?:
Well, first, we decide to iterate over the 9 blocks that we want. These are governed by the r and c variables. This is also why we multiply them by 3 when we access the numbers on the board (because each block is a square of side 3).
Next, we want to iterate over the elements in each block. Translation: Lookup the numbers within each 3x3 block. The index of each element within the block is governed by i and j. So we have i and j that govern the elements we want to access, along with r and c, which are their offsets from the board itself, determining the location of the "block" we want. Now we're off to the races.
For each r and c (notice that each loops over range(3), so there are 9 (r,c) pairs - the 9 blocks that we are after), loop over the 9 elements in the block (the 9 (i,j) pairs). Now, simply access the elements based on their relative locations from the (r,c) offsets (3*r gives the first row of the relevant block, and adding i gives the row of the required element. Similarly, 3*c gives the first column of the relevant block, and adding j gives the column of the required element. Thus, we have the coordinates of the element we want). Now, we add the element to block.
Once we've looped over all the elements in the block, we add the block itself to the answer, and presto! we're done
You can do this with a combination of reshape and transpose when you use numpy.
edit - sorry - hit enter too soon:
import numpy as np
board=[[2,0,0,0,0,0,0,6,0],
[0,0,0,0,7,5,0,3,0],
[0,4,8,0,9,0,1,0,0],
[0,0,0,3,0,0,0,0,0],
[3,0,0,0,1,0,0,0,9],
[0,0,0,0,0,8,0,0,0],
[0,0,1,0,2,0,5,7,0],
[0,8,0,7,3,0,0,0,0],
[0,9,0,0,0,0,0,0,4]]
t = np.array(board).reshape((3,3,3,3)).transpose((0,2,1,3)).reshape((9,9));
print t
Output:
[[2 0 0 0 0 0 0 4 8]
[0 0 0 0 7 5 0 9 0]
[0 6 0 0 3 0 1 0 0]
[0 0 0 3 0 0 0 0 0]
[3 0 0 0 1 0 0 0 8]
[0 0 0 0 0 9 0 0 0]
[0 0 1 0 8 0 0 9 0]
[0 2 0 7 3 0 0 0 0]
[5 7 0 0 0 0 0 0 4]]
should work, in python3 you might replace "(m/3)*3" with "int(m/3)*3"
[[board[(m/3)*3+i][(m%3)*3+j] for i in range(3) for j in range(3)] for m in range(9)]
this uses no builtins and is faster 3 nested for loops
def get_boxes(board):
boxes = []
for i in range(9):
if i == 0 or i % 3 == 0:
box_set_1 = board[i][:3] + board[i + 1][:3] + board[i + 2][:3]
boxes.append(box_set_1)
box_set_2 = board[i][3:6] + board[i + 1][3:6] + board[i + 2][3:6]
boxes.append(box_set_2)
box_set_3 = board[i][6:] + board[i + 1][6:] + board[i + 2][6:]
boxes.append(box_set_3)
def get_boxes(board):
boxes = []
for i in range(9):
if i == 0 or i % 3 == 0:
box_set_1 = board[i][:3] + board[i + 1][:3] + board[i + 2][:3]
boxes.append(box_set_1)
box_set_2 = board[i][3:6] + board[i + 1][3:6] + board[i + 2][3:6]
boxes.append(box_set_2)
box_set_3 = board[i][6:] + board[i + 1][6:] + board[i + 2][6:]
boxes.append(box_set_3)
return boxes
Past midnight and maybe someone has an idea how to tackle a problem of mine. I want to count the number of adjacent cells (which means the number of array fields with other values eg. zeroes in the vicinity of array values) as sum for each valid value!.
Example:
import numpy, scipy
s = ndimage.generate_binary_structure(2,2) # Structure can vary
a = numpy.zeros((6,6), dtype=numpy.int) # Example array
a[2:4, 2:4] = 1;a[2,4] = 1 # with example value structure
print a
>[[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 1 1 1 0]
[0 0 1 1 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]]
# The value at position [2,4] is surrounded by 6 zeros, while the one at
# position [2,2] has 5 zeros in the vicinity if 's' is the assumed binary structure.
# Total sum of surrounding zeroes is therefore sum(5+4+6+4+5) == 24
How can i count the number of zeroes in such way if the structure of my values vary?
I somehow believe to must take use of the binary_dilation function of SciPy, which is able to enlarge the value structure, but simple counting of overlaps can't lead me to the correct sum or does it?
print ndimage.binary_dilation(a,s).astype(a.dtype)
[[0 0 0 0 0 0]
[0 1 1 1 1 1]
[0 1 1 1 1 1]
[0 1 1 1 1 1]
[0 1 1 1 1 0]
[0 0 0 0 0 0]]
Use a convolution to count neighbours:
import numpy
import scipy.signal
a = numpy.zeros((6,6), dtype=numpy.int) # Example array
a[2:4, 2:4] = 1;a[2,4] = 1 # with example value structure
b = 1-a
c = scipy.signal.convolve2d(b, numpy.ones((3,3)), mode='same')
print numpy.sum(c * a)
b = 1-a allows us to count each zero while ignoring the ones.
We convolve with a 3x3 all-ones kernel, which sets each element to the sum of it and its 8 neighbouring values (other kernels are possible, such as the + kernel for only orthogonally adjacent values). With these summed values, we mask off the zeros in the original input (since we don't care about their neighbours), and sum over the whole array.
I think you already got it. after dilation, the number of 1 is 19, minus 5 of the starting shape, you have 14. which is the number of zeros surrounding your shape. Your total of 24 has overlaps.