numpy image replace areas with equal values - python

So I have a gray (2D) image of type np.array with a lot of zeros and objects inside of it. Each object is defined by its pixels having the same value, e.g. 1.23e15.
I now want to label the image, i.e. I want to rescale all pixels of a certain value (eg 200 pixels of the above value 1.23e15) to one integer number.
Apart from the background which is zero, I want each region to be set to one of the values in range(1,nbr_of_regions_in_img+1).
How can I do this time efficiently (I have hundreds of thousands of images) without the obvious looping solution?

Scipy has an extensive library for image manipulation and analysis. The function, you are looking for is probably scipy.ndimage.label
import scipy.ndimage
import numpy as np
pix = np.array([[0,0,1,1,0,0],
mask_obj, n_obj = scipy.ndimage.label(pix)
The output gives you both, a labelled mask with a different number for each identified object and the number of identified objects.
>>>[[0 0 1 1 0 0]
[0 0 1 1 1 0]
[2 2 0 0 1 0]
[0 2 0 0 0 0]]
You can also define, what should count as a neighbouring cell with the structure parameter:
s = np.asarray([[1,1,1],
mask_obj, n_obj = scipy.ndimage.label(pix, structure = s)
>>>[[0 0 1 1 0 0]
[0 0 1 1 1 0]
[1 1 0 0 1 0]
[0 1 0 0 0 0]]
Difficulties will arise, if different objects touch each other, i.e. they are not separated by a zero value.


Numpy Interpolation Between Points Within Array (scipy.griddata)

I have a numpy array of a fixed size holding irregularly spaced data. An example would be:
[1 0 0 0 3 0 0 0 2 0
0 1 0 0 0 0 0 0 2 0
0 1 0 0 1 0 6 0 9 0
0 0 0 0 6 0 3 0 0 1]
I want to keep the array the same shape, but have all the 0 values overwritten with data interpolated from the points that do have data. If the data points in the array are thought of as height values, this would essentially be creating a surface over the points.
I have been trying to use scipy.interpolate.griddata but am continually getting errors. I start with an array of my known data points, as [x, y, value]. For the above, (first row only for brevity)
data = [0, 0, 1
0, 3, 3
0, 8, 2 ....................
I then define
points = (data[:,0], data[:,1])
values = (data[:,2])
Next, I define the points to sample at (in this case, the grid I desire)
grid = np.indices((4,10))
Finally, call griddata
t = interpolate.griddata(points, values, grid, method = 'linear')
This returns the following error
ValueError: number of dimensions in xi does not match x
Am I using the wrong function?
Solved: You need to pass the desired points as a tuple
t = interpolate.griddata(points, values, (grid[0,:,:], grid[1,:,:]), method = 'linear')

Create a matrix for datapoints in same or different clusters

I want to iterate through my datapoints and check whether they are in the same cluster, after using KMeans to cluster them.
And then I need to create a matrix for all the datapoints, and have 1 if they belong on the same cluster, and 0 if they don't.
After using Kmeans, I'm not sure how to retrieve which cluster every datapoint belongs to so I can create such matrix.
Do I do that using labels_ argument?
k_means = KMeans(n_clusters=5).fit(X)
labels_columns = k_means.labels_
labels_row = k_means.labels_
for row in labels_row:
for column in labels_columns:
if row == columns:
--add 1 in matrix position
--add 0 in matrix position
How to best create this matrix? Or do they labels_ provide different information from what my understanding?
Any help is appreciated!
You are on the right track. Kmeans.labels_ returns a vector of n elements which tells you that the
cluster each point belongs to: [3, 4, 10, ...] tells you that point 0 belongs to cluster 3, point 1
belongs to cluster 4 and so on.
You can build the matrix you want in many ways. One possibility I thought which is a bit more elegant than
2 for loops would be the following:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
n_samples, n_features = 10, 2
X, y = make_blobs(n_samples, n_features)
plt.scatter(X[:, 0], X[:, 1], c=y)
kmeans = KMeans(n_clusters=3).fit(X)
plt.scatter(X[:, 0], X[:, 1], c=kmeans.labels_)
neighbour_matrix = np.zeros(n_samples)
repeat_labels = np.repeat(kmeans.labels_.T, n_samples, axis=0).reshape(n_samples, n_samples)
proximity_matrix = (repeat_labels == repeat_labels.T).astype(int)
I use the vector of labels as my starting point. Let's say that it is the following:
[1 0 0 1 1 2 2 2 2 0]
I transform it in a 2D matrix with np.repeat which has the following shape:
[[1 1 1 1 1 1 1 1 1 1]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[1 1 1 1 1 1 1 1 1 1]
So I repeat the labels as many times as is the number of points n. Then I can just check where this
matrix and its transpose are equal. That will be true only if two points belong to the same cluster:
[[1 0 0 1 1 0 0 0 0 0]
[0 1 1 0 0 0 0 0 0 1]
[0 1 1 0 0 0 0 0 0 1]
[1 0 0 1 1 0 0 0 0 0]
I casted the matrix to int, but mind you that the original output is actually a boolean array.
I left the print statements and the plots in the code to hopefully make it more clear.
Hope it helps!

How can I optimize searching and matching through multi-dimensional arrays?

I'm trying to match up the elements in 2 different arrays. Array_A is a 3d map of A_Clouds, Array_B is a 3d map of B_Clouds. Each "cloud" is continuous, i.e. any isolated pixels would define a new cloud. The values of the pixels are a single, unique integer for each cloud. Non-cloud values are 0. Here's a 2D example:
[[0 0 0 0 0 0 0 0 0]
[0 0 0 1 1 1 0 0 0]
[0 0 1 1 1 1 1 1 0]
[0 0 0 1 1 1 1 1 0]
[0 0 0 0 0 1 0 0 0]
[0 0 0 0 0 0 0 0 0]]
The output I need is simply the IDs (for both clouds) of each A_Cloud which is overlapping with a B_Cloud, and the number (locations not needed) of pixels which are overlapping between those clouds.
The problem is that these are both very large 3 dimensional arrays (~2000x2000x200, both are the same size). I'm basically doing a bunch of nested for loops, which is of course very slow. Is there a faster way that I could approach this problem? Thanks in advance.
This is what I have right now (simplified to 2d):
final_matches = []
for Acloud_id in ACloud_list:
Acloud_locs = list(set([(i,j) for j, line in enumerate(Array_A) for i,pix in enumerate(line) if pix == Acloud_id]))
matches = []
for loc in Acloud_locs:
Bcloud_pix = Array_B[loc[0]][loc[1]]
if Bcloud_pix:
final_matches.append([Acloud_id, counter])
Thanks in advance!
Some considerations here:
for Acloud_id in ACloud_list:
Acloud_locs = list(set([(i,j) for j, line in enumerate(Array_A) for i,pix in enumerate(line) if pix == Acloud_id]))
If I've read that right, this needs to check every pixel in the array in order to generate the set, and it repeats that for every cloud in A. So if you have 500 clouds, you're checking every pixel 500 times. This is not going to scale well!
Might be more efficient to store the overlap counts in a dict, and just go through the arrays once:
for i in possible_x_coords: # define these however you like
for j in possible_y_coords:
if (Array_A[i][j] and Array_B[i][j]):
overlaps[(Array_A[i][j],Array_B[i][j])] = 1 + overlaps.get((Array_A[i][j],Array_B[i][j]),0)
(apologies for any errors, I'm on the road and can't test my code)
update: You've clarified that the arrays are about 80% sparse. If that figure was a lot higher, and if you had control over the format of your inputs, I'd suggest looking into sparse array formats - if your input only stores the non-zero values for A, this can save you the trouble of checking for zero values in A. However, for something that's only 80% sparse, I'm not sure how much efficiency this would add.

OpenCV FloodFill with multiple seeds

Is there a floodFill function for python/openCV that takes a list of seeds and starts changing the color of its neighbours? I know that simplecv as a function like that SimpleCV floodFill. OpenCV says it has two floodFill functions when that uses a mask and another one that doesn't, documentation, I'm not being able to use the opencv floodfill function without a mask and with a list of seeds. Any help?
This is what I'm trying to do so far:
mask = np.ones((A.shape[0]+2,A.shape[0]+2),np.uint8)
mask[1:-1,1:-1] = np.zeros((A.shape))
cv.floodFill(A, mask, (3,0), 0,0,0, flags=4|cv.FLOODFILL_MASK_ONLY)
print mask
returned mask:
[[1 1 1 1 1 1]
[1 1 0 0 1 1]
[1 1 1 1 1 1]
[1 0 0 0 0 1]
[1 0 0 0 0 1]
[1 1 1 1 1 1]]
Expected mask:
[[1 1 1 1 1 1]
[1 0 0 0 0 1]
[1 0 0 0 0 1]
[1 1 1 1 1 1]
[1 1 1 1 1 1]
[1 1 1 1 1 1]]
Original Image:
[[0 1 1 0]
[0 0 0 0]
[1 1 1 1]
[1 1 1 1]]
If you look closely at the documentation, that's one of the purpose of mask. You can call multiple times the function (2nd version) every time with a different seed, and at the end mask will contain the area that has been floodfilled. If a new seed belongs to an area already floodfilled, your function call will return immediately.
Use the FLOODFILL_MASK_ONLY flag, and then use this mask to paint your input image with the desidered filling color at the end with a setTo() (You'll have to use a subimage of Mask! Removing first and last row and column). Note that your floodfill might produce different results depending on the order you process your seed points if you set loDiff or upDiff to something different than the default value zero.
Take also a look at this.

Counting of adjacent cells in a numpy array

Past midnight and maybe someone has an idea how to tackle a problem of mine. I want to count the number of adjacent cells (which means the number of array fields with other values eg. zeroes in the vicinity of array values) as sum for each valid value!.
import numpy, scipy
s = ndimage.generate_binary_structure(2,2) # Structure can vary
a = numpy.zeros((6,6), # Example array
a[2:4, 2:4] = 1;a[2,4] = 1 # with example value structure
print a
>[[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 1 1 1 0]
[0 0 1 1 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]]
# The value at position [2,4] is surrounded by 6 zeros, while the one at
# position [2,2] has 5 zeros in the vicinity if 's' is the assumed binary structure.
# Total sum of surrounding zeroes is therefore sum(5+4+6+4+5) == 24
How can i count the number of zeroes in such way if the structure of my values vary?
I somehow believe to must take use of the binary_dilation function of SciPy, which is able to enlarge the value structure, but simple counting of overlaps can't lead me to the correct sum or does it?
print ndimage.binary_dilation(a,s).astype(a.dtype)
[[0 0 0 0 0 0]
[0 1 1 1 1 1]
[0 1 1 1 1 1]
[0 1 1 1 1 1]
[0 1 1 1 1 0]
[0 0 0 0 0 0]]
Use a convolution to count neighbours:
import numpy
import scipy.signal
a = numpy.zeros((6,6), # Example array
a[2:4, 2:4] = 1;a[2,4] = 1 # with example value structure
b = 1-a
c = scipy.signal.convolve2d(b, numpy.ones((3,3)), mode='same')
print numpy.sum(c * a)
b = 1-a allows us to count each zero while ignoring the ones.
We convolve with a 3x3 all-ones kernel, which sets each element to the sum of it and its 8 neighbouring values (other kernels are possible, such as the + kernel for only orthogonally adjacent values). With these summed values, we mask off the zeros in the original input (since we don't care about their neighbours), and sum over the whole array.
I think you already got it. after dilation, the number of 1 is 19, minus 5 of the starting shape, you have 14. which is the number of zeros surrounding your shape. Your total of 24 has overlaps.
