How can I optimize searching and matching through multi-dimensional arrays? - python

I'm trying to match up the elements in 2 different arrays. Array_A is a 3d map of A_Clouds, Array_B is a 3d map of B_Clouds. Each "cloud" is continuous, i.e. any isolated pixels would define a new cloud. The values of the pixels are a single, unique integer for each cloud. Non-cloud values are 0. Here's a 2D example:
[[0 0 0 0 0 0 0 0 0]
[0 0 0 1 1 1 0 0 0]
[0 0 1 1 1 1 1 1 0]
[0 0 0 1 1 1 1 1 0]
[0 0 0 0 0 1 0 0 0]
[0 0 0 0 0 0 0 0 0]]
The output I need is simply the IDs (for both clouds) of each A_Cloud which is overlapping with a B_Cloud, and the number (locations not needed) of pixels which are overlapping between those clouds.
The problem is that these are both very large 3 dimensional arrays (~2000x2000x200, both are the same size). I'm basically doing a bunch of nested for loops, which is of course very slow. Is there a faster way that I could approach this problem? Thanks in advance.
This is what I have right now (simplified to 2d):
final_matches = []
for Acloud_id in ACloud_list:
Acloud_locs = list(set([(i,j) for j, line in enumerate(Array_A) for i,pix in enumerate(line) if pix == Acloud_id]))
matches = []
for loc in Acloud_locs:
Bcloud_pix = Array_B[loc[0]][loc[1]]
if Bcloud_pix:
matches.append(Bcloud_pix)
counter=collections.Counter(matches)
final_matches.append([Acloud_id, counter])
Thanks in advance!

Some considerations here:
for Acloud_id in ACloud_list:
Acloud_locs = list(set([(i,j) for j, line in enumerate(Array_A) for i,pix in enumerate(line) if pix == Acloud_id]))
If I've read that right, this needs to check every pixel in the array in order to generate the set, and it repeats that for every cloud in A. So if you have 500 clouds, you're checking every pixel 500 times. This is not going to scale well!
Might be more efficient to store the overlap counts in a dict, and just go through the arrays once:
overlaps=dict()
for i in possible_x_coords: # define these however you like
for j in possible_y_coords:
if (Array_A[i][j] and Array_B[i][j]):
overlaps[(Array_A[i][j],Array_B[i][j])] = 1 + overlaps.get((Array_A[i][j],Array_B[i][j]),0)
(apologies for any errors, I'm on the road and can't test my code)
update: You've clarified that the arrays are about 80% sparse. If that figure was a lot higher, and if you had control over the format of your inputs, I'd suggest looking into sparse array formats - if your input only stores the non-zero values for A, this can save you the trouble of checking for zero values in A. However, for something that's only 80% sparse, I'm not sure how much efficiency this would add.

Related

Creating submatrix in python

Given a matrix S and a binary matrix W, I want to create a submatrix of S corresponding to the non zero coordinates of W.
For example:
S = [[1,1],[1,2],[1,3],[1,4],[1,5]]
W = [[1,0,0],[1,1,0],[1,1,1],[0,1,1],[0,0,1]]
I want to get matrices
S_1 = [[1,1],[1,2],[1,3]]
S_2 = [[1,2],[1,3],[1,4]]
S_3 = [[1,3],[1,4],[1,5]]
I couldn't figure out a slick way to do this in python. The best I could do for each S_i is
S_1 = S[0,:]
for i in range(np.shape(W)[0]):
if W[i, 0] == 1:
S_1 = np.vstack((S_1, S[i, :]))
but if i want to change the dimensions of the problem and have, say, 100 S_i's, writing a for loop for each one seems a bit ugly. (Side note: S_1 should be initialized to some empty 2d array but I couldn't get that to work, so initialized it to S[0,:] as a placeholder).
EDIT: To clarify what I mean:
I have a matrix S
1 1
1 2
1 3
1 4
1 5
and I have a binary matrix
1 0 0
1 1 0
1 1 1
0 1 1
0 0 1
Given the first column of the binary matrix W
1
1
1
0
0
The 1's are in the first, second, and third positions. So I want to create a corresponding submatrix of S with just the first, second and third positions of every column, so S_1 (corresponding to the 1st column of W) is
1 1
1 2
1 3
Similarly, if we look at the third column of W
0
0
1
1
1
The 1's are in the last three coordinates and so I want a submatrix of S with just the last three coordinates of every column, called S_3
1 3
1 4
1 5
So given any ith column of the binary matrix, I'm looking to generate a submatrix S_i where the columns of S_i contain the columns of S, but only the entries corresponding to the positions of the 1's in the ith column of the binary matrix.
It probably is more useful to work with the transpose of W rather than W itself, both for human-readability and to facilitate writing the code. This means that the entries that affect each S_i are grouped together in one of the inner parentheses of W, i.e. in a row of W rather than a column as you have it now.
Then, S_i = np.array[S[j,:] for j in np.shape(S)[0] if W_T[i,j] == 1], where W_T is the transpose of W. If you need/want to stick with W as is, you need to reverse the indices i and j.
As for the outer loop, you could try to nest this in another similar comprehension without an if statement--however this might be awkward since you aren't actually building one output matrix (the S_i can easily be different dimensions, unless you're somehow guaranteed to have the same number of 1s in every column of W). This in fact raises the question of what you want--a list of these arrays S_i? Otherwise if they are separate variables as you have it written, there's no good way to refer to them in a generalizable way as they don't have indices.
Numpy can do this directly.
import numpy as np
S = np.array([[1,1],[1,2],[1,3],[1,4],[1,5]])
W = np.array([[1,0,0],[1,1,0],[1,1,1],[0,1,1],[0,0,1]])
for row in range(W.shape[1]):
print(S[W[:,row]==1])
Output:
[[1 1]
[1 2]
[1 3]]
[[1 2]
[1 3]
[1 4]]
[[1 3]
[1 4]
[1 5]]

Python 9x9 and 3x3 array validation excluding 0

I am trying to validate if any numbers are duplicates in a 9x9 array however need to exclude all 0 as they are the once I will solve later. I have a 9x9 array and would like to validate if there are any duplicates in the rows and columns however excluding all 0 from the check only numbers from 1 to 9 only. The input array as example would be:
[[1 0 0 7 0 0 0 0 0]
[0 3 2 0 0 0 0 0 0]
[0 0 0 6 0 0 0 0 0]
[0 8 0 0 0 2 0 7 0]
[5 0 7 0 0 1 0 0 0]
[0 0 0 0 0 3 6 1 0]
[7 0 0 0 0 0 2 0 9]
[0 0 0 0 5 0 0 0 0]
[3 0 0 0 0 4 0 0 5]]
Here is where I am currently with my code for this:
#Checking Columns
for c in range(9):
line = (test[:,c])
print(np.unique(line).shape == line.shape)
#Checking Rows
for r in range(9):
line = (test[r,:])
print(np.unique(line).shape == line.shape)
Then I would like to do the exact same for the 3x3 sub arrays in the 9x9 array. Again I need to somehow exclude the 0 from the check. Here is the code I currently have:
for r0 in range(3,9,3):
for c0 in range(3,9,3):
test1 = test[:r0,:c0]
for r in range(3):
line = (test1[r,:])
print(np.unique(line).shape == line.shape)
for c in range(3):
line = (test1[:,c])
print(np.unique(line).shape == line.shape)
``
I would truly appreciate assistance in this regard.
It sure sounds like you're trying to verify the input of a Sudoku board.
You can extract a box as:
for r0 in range(0, 9, 3):
for c0 in range(0, 9, 3):
box = test1[r0:r0+3, c0:c0+3]
... test that np.unique(box) has 9 elements...
Note that this is only about how to extract the elements of the box. You still haven't done anything about removing the zeros, here or on the rows and columns.
Given a box/row/column, you then want something like:
nonzeros = [x for x in box.flatten() if x != 0]
assert len(nonzeros) == len(set(nonzeros))
There may be a more numpy-friendly way to do this, but this should be fast enough.
Excluding zeros is fairly straight forward by masking the array
test = np.array(test)
non_zero_mask = (test != 0)
At this point you can either check the whole matrix for uniqueness
np.unique(test[non_zero_mask])
or you can do it for individual rows/columns
non_zero_row_0 = test[0, non_zero_mask[0]]
unique_0 = np.unique(non_zero_row_0)
You can add the logic above into a loop to get the behavior you want
As for the 3x3 subarrays, you can loop through them as you did in your example.
When you have a small collection of things (small being <=64 or 128, depending on architecture), you can turn it into a set using bits. So for example:
bits = ((2**board) >> 1).astype(np.uint16)
Notice that you have to use right shift after the fact rather than pre-subtracting 1 from board to cleanly handle zeros.
You can now compute three types of sets. Each set is the bitwise OR of bits in a particular arrangement. For this example, you can use sum just the same:
rows = bits.sum(axis=1)
cols = bits.sum(axis=0)
blocks = bits.reshape(3, 3, 3, 3).sum(axis=(1, 3))
Now all you have to do is compare the bit counts of each number to the number of non-zero elements. They will be equal if and only if there are no duplicates. Duplicates will cause the bit count to be smaller.
There are pretty efficient algorithms for counting bits, especially for something as small as a uint16. Here is an example: How to count the number of set bits in a 32-bit integer?. I've adapted it for the smaller size and numpy here:
def count_bits16(arr):
count = arr - ((arr >> 1) & 0x5555)
count = (count & 0x3333) + ((count >> 2) & 0x3333)
return (count * 0x0101) >> 8
This is the count of unique elements for each of the configurations. You need to compare it to the number of non-zero elements. The following boolean will tell you if the board is valid:
count_bits16(rows) == np.count_nonzero(board, axis=1) and \
count_bits16(cols) == np.count_nonzero(board, axis=0) and \
count_bits16(blocks) == np.count_nonzero(board.reshape(3, 3, 3, 3), axis=(1, 3))

Numpy Interpolation Between Points Within Array (scipy.griddata)

I have a numpy array of a fixed size holding irregularly spaced data. An example would be:
[1 0 0 0 3 0 0 0 2 0
0 1 0 0 0 0 0 0 2 0
0 1 0 0 1 0 6 0 9 0
0 0 0 0 6 0 3 0 0 1]
I want to keep the array the same shape, but have all the 0 values overwritten with data interpolated from the points that do have data. If the data points in the array are thought of as height values, this would essentially be creating a surface over the points.
I have been trying to use scipy.interpolate.griddata but am continually getting errors. I start with an array of my known data points, as [x, y, value]. For the above, (first row only for brevity)
data = [0, 0, 1
0, 3, 3
0, 8, 2 ....................
I then define
points = (data[:,0], data[:,1])
values = (data[:,2])
Next, I define the points to sample at (in this case, the grid I desire)
grid = np.indices((4,10))
Finally, call griddata
t = interpolate.griddata(points, values, grid, method = 'linear')
This returns the following error
ValueError: number of dimensions in xi does not match x
Am I using the wrong function?
Thanks!
Solved: You need to pass the desired points as a tuple
t = interpolate.griddata(points, values, (grid[0,:,:], grid[1,:,:]), method = 'linear')

numpy image replace areas with equal values

So I have a gray (2D) image of type np.array with a lot of zeros and objects inside of it. Each object is defined by its pixels having the same value, e.g. 1.23e15.
I now want to label the image, i.e. I want to rescale all pixels of a certain value (eg 200 pixels of the above value 1.23e15) to one integer number.
Apart from the background which is zero, I want each region to be set to one of the values in range(1,nbr_of_regions_in_img+1).
How can I do this time efficiently (I have hundreds of thousands of images) without the obvious looping solution?
Scipy has an extensive library for image manipulation and analysis. The function, you are looking for is probably scipy.ndimage.label
import scipy.ndimage
import numpy as np
pix = np.array([[0,0,1,1,0,0],
[0,0,1,1,1,0],
[1,1,0,0,1,0],
[0,1,0,0,0,0]])
mask_obj, n_obj = scipy.ndimage.label(pix)
The output gives you both, a labelled mask with a different number for each identified object and the number of identified objects.
>>>print(n_obj)
>>>2
>>>print(mask_obj)
>>>[[0 0 1 1 0 0]
[0 0 1 1 1 0]
[2 2 0 0 1 0]
[0 2 0 0 0 0]]
You can also define, what should count as a neighbouring cell with the structure parameter:
s = np.asarray([[1,1,1],
[1,1,1],
[1,1,1]])
mask_obj, n_obj = scipy.ndimage.label(pix, structure = s)
>>>print(n_obj)
>>>1
>>>print(mask_obj)
>>>[[0 0 1 1 0 0]
[0 0 1 1 1 0]
[1 1 0 0 1 0]
[0 1 0 0 0 0]]
Difficulties will arise, if different objects touch each other, i.e. they are not separated by a zero value.

Counting of adjacent cells in a numpy array

Past midnight and maybe someone has an idea how to tackle a problem of mine. I want to count the number of adjacent cells (which means the number of array fields with other values eg. zeroes in the vicinity of array values) as sum for each valid value!.
Example:
import numpy, scipy
s = ndimage.generate_binary_structure(2,2) # Structure can vary
a = numpy.zeros((6,6), dtype=numpy.int) # Example array
a[2:4, 2:4] = 1;a[2,4] = 1 # with example value structure
print a
>[[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 1 1 1 0]
[0 0 1 1 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]]
# The value at position [2,4] is surrounded by 6 zeros, while the one at
# position [2,2] has 5 zeros in the vicinity if 's' is the assumed binary structure.
# Total sum of surrounding zeroes is therefore sum(5+4+6+4+5) == 24
How can i count the number of zeroes in such way if the structure of my values vary?
I somehow believe to must take use of the binary_dilation function of SciPy, which is able to enlarge the value structure, but simple counting of overlaps can't lead me to the correct sum or does it?
print ndimage.binary_dilation(a,s).astype(a.dtype)
[[0 0 0 0 0 0]
[0 1 1 1 1 1]
[0 1 1 1 1 1]
[0 1 1 1 1 1]
[0 1 1 1 1 0]
[0 0 0 0 0 0]]
Use a convolution to count neighbours:
import numpy
import scipy.signal
a = numpy.zeros((6,6), dtype=numpy.int) # Example array
a[2:4, 2:4] = 1;a[2,4] = 1 # with example value structure
b = 1-a
c = scipy.signal.convolve2d(b, numpy.ones((3,3)), mode='same')
print numpy.sum(c * a)
b = 1-a allows us to count each zero while ignoring the ones.
We convolve with a 3x3 all-ones kernel, which sets each element to the sum of it and its 8 neighbouring values (other kernels are possible, such as the + kernel for only orthogonally adjacent values). With these summed values, we mask off the zeros in the original input (since we don't care about their neighbours), and sum over the whole array.
I think you already got it. after dilation, the number of 1 is 19, minus 5 of the starting shape, you have 14. which is the number of zeros surrounding your shape. Your total of 24 has overlaps.

Categories