Past midnight and maybe someone has an idea how to tackle a problem of mine. I want to count the number of adjacent cells (which means the number of array fields with other values eg. zeroes in the vicinity of array values) as sum for each valid value!.
Example:
import numpy, scipy
s = ndimage.generate_binary_structure(2,2) # Structure can vary
a = numpy.zeros((6,6), dtype=numpy.int) # Example array
a[2:4, 2:4] = 1;a[2,4] = 1 # with example value structure
print a
>[[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 1 1 1 0]
[0 0 1 1 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]]
# The value at position [2,4] is surrounded by 6 zeros, while the one at
# position [2,2] has 5 zeros in the vicinity if 's' is the assumed binary structure.
# Total sum of surrounding zeroes is therefore sum(5+4+6+4+5) == 24
How can i count the number of zeroes in such way if the structure of my values vary?
I somehow believe to must take use of the binary_dilation function of SciPy, which is able to enlarge the value structure, but simple counting of overlaps can't lead me to the correct sum or does it?
print ndimage.binary_dilation(a,s).astype(a.dtype)
[[0 0 0 0 0 0]
[0 1 1 1 1 1]
[0 1 1 1 1 1]
[0 1 1 1 1 1]
[0 1 1 1 1 0]
[0 0 0 0 0 0]]
Use a convolution to count neighbours:
import numpy
import scipy.signal
a = numpy.zeros((6,6), dtype=numpy.int) # Example array
a[2:4, 2:4] = 1;a[2,4] = 1 # with example value structure
b = 1-a
c = scipy.signal.convolve2d(b, numpy.ones((3,3)), mode='same')
print numpy.sum(c * a)
b = 1-a allows us to count each zero while ignoring the ones.
We convolve with a 3x3 all-ones kernel, which sets each element to the sum of it and its 8 neighbouring values (other kernels are possible, such as the + kernel for only orthogonally adjacent values). With these summed values, we mask off the zeros in the original input (since we don't care about their neighbours), and sum over the whole array.
I think you already got it. after dilation, the number of 1 is 19, minus 5 of the starting shape, you have 14. which is the number of zeros surrounding your shape. Your total of 24 has overlaps.
Related
I am trying to validate if any numbers are duplicates in a 9x9 array however need to exclude all 0 as they are the once I will solve later. I have a 9x9 array and would like to validate if there are any duplicates in the rows and columns however excluding all 0 from the check only numbers from 1 to 9 only. The input array as example would be:
[[1 0 0 7 0 0 0 0 0]
[0 3 2 0 0 0 0 0 0]
[0 0 0 6 0 0 0 0 0]
[0 8 0 0 0 2 0 7 0]
[5 0 7 0 0 1 0 0 0]
[0 0 0 0 0 3 6 1 0]
[7 0 0 0 0 0 2 0 9]
[0 0 0 0 5 0 0 0 0]
[3 0 0 0 0 4 0 0 5]]
Here is where I am currently with my code for this:
#Checking Columns
for c in range(9):
line = (test[:,c])
print(np.unique(line).shape == line.shape)
#Checking Rows
for r in range(9):
line = (test[r,:])
print(np.unique(line).shape == line.shape)
Then I would like to do the exact same for the 3x3 sub arrays in the 9x9 array. Again I need to somehow exclude the 0 from the check. Here is the code I currently have:
for r0 in range(3,9,3):
for c0 in range(3,9,3):
test1 = test[:r0,:c0]
for r in range(3):
line = (test1[r,:])
print(np.unique(line).shape == line.shape)
for c in range(3):
line = (test1[:,c])
print(np.unique(line).shape == line.shape)
``
I would truly appreciate assistance in this regard.
It sure sounds like you're trying to verify the input of a Sudoku board.
You can extract a box as:
for r0 in range(0, 9, 3):
for c0 in range(0, 9, 3):
box = test1[r0:r0+3, c0:c0+3]
... test that np.unique(box) has 9 elements...
Note that this is only about how to extract the elements of the box. You still haven't done anything about removing the zeros, here or on the rows and columns.
Given a box/row/column, you then want something like:
nonzeros = [x for x in box.flatten() if x != 0]
assert len(nonzeros) == len(set(nonzeros))
There may be a more numpy-friendly way to do this, but this should be fast enough.
Excluding zeros is fairly straight forward by masking the array
test = np.array(test)
non_zero_mask = (test != 0)
At this point you can either check the whole matrix for uniqueness
np.unique(test[non_zero_mask])
or you can do it for individual rows/columns
non_zero_row_0 = test[0, non_zero_mask[0]]
unique_0 = np.unique(non_zero_row_0)
You can add the logic above into a loop to get the behavior you want
As for the 3x3 subarrays, you can loop through them as you did in your example.
When you have a small collection of things (small being <=64 or 128, depending on architecture), you can turn it into a set using bits. So for example:
bits = ((2**board) >> 1).astype(np.uint16)
Notice that you have to use right shift after the fact rather than pre-subtracting 1 from board to cleanly handle zeros.
You can now compute three types of sets. Each set is the bitwise OR of bits in a particular arrangement. For this example, you can use sum just the same:
rows = bits.sum(axis=1)
cols = bits.sum(axis=0)
blocks = bits.reshape(3, 3, 3, 3).sum(axis=(1, 3))
Now all you have to do is compare the bit counts of each number to the number of non-zero elements. They will be equal if and only if there are no duplicates. Duplicates will cause the bit count to be smaller.
There are pretty efficient algorithms for counting bits, especially for something as small as a uint16. Here is an example: How to count the number of set bits in a 32-bit integer?. I've adapted it for the smaller size and numpy here:
def count_bits16(arr):
count = arr - ((arr >> 1) & 0x5555)
count = (count & 0x3333) + ((count >> 2) & 0x3333)
return (count * 0x0101) >> 8
This is the count of unique elements for each of the configurations. You need to compare it to the number of non-zero elements. The following boolean will tell you if the board is valid:
count_bits16(rows) == np.count_nonzero(board, axis=1) and \
count_bits16(cols) == np.count_nonzero(board, axis=0) and \
count_bits16(blocks) == np.count_nonzero(board.reshape(3, 3, 3, 3), axis=(1, 3))
I want to iterate through my datapoints and check whether they are in the same cluster, after using KMeans to cluster them.
And then I need to create a matrix for all the datapoints, and have 1 if they belong on the same cluster, and 0 if they don't.
After using Kmeans, I'm not sure how to retrieve which cluster every datapoint belongs to so I can create such matrix.
Do I do that using labels_ argument?
k_means = KMeans(n_clusters=5).fit(X)
labels_columns = k_means.labels_
labels_row = k_means.labels_
for row in labels_row:
for column in labels_columns:
if row == columns:
--add 1 in matrix position
else:
--add 0 in matrix position
How to best create this matrix? Or do they labels_ provide different information from what my understanding?
Any help is appreciated!
You are on the right track. Kmeans.labels_ returns a vector of n elements which tells you that the
cluster each point belongs to: [3, 4, 10, ...] tells you that point 0 belongs to cluster 3, point 1
belongs to cluster 4 and so on.
You can build the matrix you want in many ways. One possibility I thought which is a bit more elegant than
2 for loops would be the following:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
n_samples, n_features = 10, 2
X, y = make_blobs(n_samples, n_features)
plt.scatter(X[:, 0], X[:, 1], c=y)
plt.show()
kmeans = KMeans(n_clusters=3).fit(X)
plt.scatter(X[:, 0], X[:, 1], c=kmeans.labels_)
plt.show()
neighbour_matrix = np.zeros(n_samples)
repeat_labels = np.repeat(kmeans.labels_.T, n_samples, axis=0).reshape(n_samples, n_samples)
print(kmeans.labels_)
print(repeat_labels)
proximity_matrix = (repeat_labels == repeat_labels.T).astype(int)
print(proximity_matrix)
I use the vector of labels as my starting point. Let's say that it is the following:
[1 0 0 1 1 2 2 2 2 0]
I transform it in a 2D matrix with np.repeat which has the following shape:
[[1 1 1 1 1 1 1 1 1 1]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[1 1 1 1 1 1 1 1 1 1]
.....
So I repeat the labels as many times as is the number of points n. Then I can just check where this
matrix and its transpose are equal. That will be true only if two points belong to the same cluster:
[[1 0 0 1 1 0 0 0 0 0]
[0 1 1 0 0 0 0 0 0 1]
[0 1 1 0 0 0 0 0 0 1]
[1 0 0 1 1 0 0 0 0 0]
.....
I casted the matrix to int, but mind you that the original output is actually a boolean array.
I left the print statements and the plots in the code to hopefully make it more clear.
Hope it helps!
I'm trying to match up the elements in 2 different arrays. Array_A is a 3d map of A_Clouds, Array_B is a 3d map of B_Clouds. Each "cloud" is continuous, i.e. any isolated pixels would define a new cloud. The values of the pixels are a single, unique integer for each cloud. Non-cloud values are 0. Here's a 2D example:
[[0 0 0 0 0 0 0 0 0]
[0 0 0 1 1 1 0 0 0]
[0 0 1 1 1 1 1 1 0]
[0 0 0 1 1 1 1 1 0]
[0 0 0 0 0 1 0 0 0]
[0 0 0 0 0 0 0 0 0]]
The output I need is simply the IDs (for both clouds) of each A_Cloud which is overlapping with a B_Cloud, and the number (locations not needed) of pixels which are overlapping between those clouds.
The problem is that these are both very large 3 dimensional arrays (~2000x2000x200, both are the same size). I'm basically doing a bunch of nested for loops, which is of course very slow. Is there a faster way that I could approach this problem? Thanks in advance.
This is what I have right now (simplified to 2d):
final_matches = []
for Acloud_id in ACloud_list:
Acloud_locs = list(set([(i,j) for j, line in enumerate(Array_A) for i,pix in enumerate(line) if pix == Acloud_id]))
matches = []
for loc in Acloud_locs:
Bcloud_pix = Array_B[loc[0]][loc[1]]
if Bcloud_pix:
matches.append(Bcloud_pix)
counter=collections.Counter(matches)
final_matches.append([Acloud_id, counter])
Thanks in advance!
Some considerations here:
for Acloud_id in ACloud_list:
Acloud_locs = list(set([(i,j) for j, line in enumerate(Array_A) for i,pix in enumerate(line) if pix == Acloud_id]))
If I've read that right, this needs to check every pixel in the array in order to generate the set, and it repeats that for every cloud in A. So if you have 500 clouds, you're checking every pixel 500 times. This is not going to scale well!
Might be more efficient to store the overlap counts in a dict, and just go through the arrays once:
overlaps=dict()
for i in possible_x_coords: # define these however you like
for j in possible_y_coords:
if (Array_A[i][j] and Array_B[i][j]):
overlaps[(Array_A[i][j],Array_B[i][j])] = 1 + overlaps.get((Array_A[i][j],Array_B[i][j]),0)
(apologies for any errors, I'm on the road and can't test my code)
update: You've clarified that the arrays are about 80% sparse. If that figure was a lot higher, and if you had control over the format of your inputs, I'd suggest looking into sparse array formats - if your input only stores the non-zero values for A, this can save you the trouble of checking for zero values in A. However, for something that's only 80% sparse, I'm not sure how much efficiency this would add.
So I have a gray (2D) image of type np.array with a lot of zeros and objects inside of it. Each object is defined by its pixels having the same value, e.g. 1.23e15.
I now want to label the image, i.e. I want to rescale all pixels of a certain value (eg 200 pixels of the above value 1.23e15) to one integer number.
Apart from the background which is zero, I want each region to be set to one of the values in range(1,nbr_of_regions_in_img+1).
How can I do this time efficiently (I have hundreds of thousands of images) without the obvious looping solution?
Scipy has an extensive library for image manipulation and analysis. The function, you are looking for is probably scipy.ndimage.label
import scipy.ndimage
import numpy as np
pix = np.array([[0,0,1,1,0,0],
[0,0,1,1,1,0],
[1,1,0,0,1,0],
[0,1,0,0,0,0]])
mask_obj, n_obj = scipy.ndimage.label(pix)
The output gives you both, a labelled mask with a different number for each identified object and the number of identified objects.
>>>print(n_obj)
>>>2
>>>print(mask_obj)
>>>[[0 0 1 1 0 0]
[0 0 1 1 1 0]
[2 2 0 0 1 0]
[0 2 0 0 0 0]]
You can also define, what should count as a neighbouring cell with the structure parameter:
s = np.asarray([[1,1,1],
[1,1,1],
[1,1,1]])
mask_obj, n_obj = scipy.ndimage.label(pix, structure = s)
>>>print(n_obj)
>>>1
>>>print(mask_obj)
>>>[[0 0 1 1 0 0]
[0 0 1 1 1 0]
[1 1 0 0 1 0]
[0 1 0 0 0 0]]
Difficulties will arise, if different objects touch each other, i.e. they are not separated by a zero value.
Is there a floodFill function for python/openCV that takes a list of seeds and starts changing the color of its neighbours? I know that simplecv as a function like that SimpleCV floodFill. OpenCV says it has two floodFill functions when that uses a mask and another one that doesn't, documentation, I'm not being able to use the opencv floodfill function without a mask and with a list of seeds. Any help?
This is what I'm trying to do so far:
A=array([[0,1,1,0],[0,0,0,0],[1,1,1,1],[1,1,1,1]],np.uint8)
mask = np.ones((A.shape[0]+2,A.shape[0]+2),np.uint8)
mask[1:-1,1:-1] = np.zeros((A.shape))
cv.floodFill(A, mask, (3,0), 0,0,0, flags=4|cv.FLOODFILL_MASK_ONLY)
print mask
returned mask:
[[1 1 1 1 1 1]
[1 1 0 0 1 1]
[1 1 1 1 1 1]
[1 0 0 0 0 1]
[1 0 0 0 0 1]
[1 1 1 1 1 1]]
Expected mask:
[[1 1 1 1 1 1]
[1 0 0 0 0 1]
[1 0 0 0 0 1]
[1 1 1 1 1 1]
[1 1 1 1 1 1]
[1 1 1 1 1 1]]
Original Image:
[[0 1 1 0]
[0 0 0 0]
[1 1 1 1]
[1 1 1 1]]
If you look closely at the documentation, that's one of the purpose of mask. You can call multiple times the function (2nd version) every time with a different seed, and at the end mask will contain the area that has been floodfilled. If a new seed belongs to an area already floodfilled, your function call will return immediately.
Use the FLOODFILL_MASK_ONLY flag, and then use this mask to paint your input image with the desidered filling color at the end with a setTo() (You'll have to use a subimage of Mask! Removing first and last row and column). Note that your floodfill might produce different results depending on the order you process your seed points if you set loDiff or upDiff to something different than the default value zero.
Take also a look at this.