arr= [1,2,3,4]
k = 4 (can be different)
so result will be 2 d array. How to do this without using any loop? and can't hard code k.
k and arr can vary as per input.
Must use numpy.pad
[[1,2,3,4,0,0,0], #k-1 zeros
[0,1,2,3,4,0,0],
[0,0,1,2,3,4,0],
[0,0,0,1,2,3,4]]
If you really have to do it without a loop (for educational purposes)
np.pad(np.tile(arr,[k,1]), [(0,0),(0,k)]).reshape(-1)[:-k].reshape(k,-1)
Using list comprehension as a one liner :
import numpy as np
arr= np.array([1,2,3,4])
k = 4
print( np.array( [ np.pad(arr, (0+i , k-1-i ) ) for i in range(0,k)] ) )
Out :
[[1 2 3 4 0 0 0]
[0 1 2 3 4 0 0]
[0 0 1 2 3 4 0]
[0 0 0 1 2 3 4]]
I am looking for the coordinates of connected blobs in a binary image (2d numpy array of 0 or 1).
The skimage library provides a very fast way to label blobs within the array (which I found from similar SO posts). However I want a list of the coordinates of the blob, not a labelled array. I have a solution which extracts the coordinates from the labelled image. But it is very slow. Far slower than the inital labelling.
Minimal Reproducible example:
import timeit
from skimage import measure
import numpy as np
binary_image = np.array([
[0,1,0,0,1,1,0,1,1,0,0,1],
[0,1,0,1,1,1,0,1,1,1,0,1],
[0,0,0,0,0,0,0,1,1,1,0,0],
[0,1,1,1,1,0,0,0,0,1,0,0],
[0,0,0,0,0,0,0,1,1,1,0,0],
[0,0,1,0,0,0,0,0,0,0,0,0],
[0,1,0,0,1,1,0,1,1,0,0,1],
[0,0,0,0,0,0,0,1,1,1,0,0],
[0,1,1,1,1,0,0,0,0,1,0,0],
])
print(f"\n\n2d array of type: {type(binary_image)}:")
print(binary_image)
labels = measure.label(binary_image)
print(f"\n\n2d array with connected blobs labelled of type {type(labels)}:")
print(labels)
def extract_blobs_from_labelled_array(labelled_array):
# The goal is to obtain lists of the coordinates
# Of each distinct blob.
blobs = []
label = 1
while True:
indices_of_label = np.where(labelled_array==label)
if not indices_of_label[0].size > 0:
break
else:
blob =list(zip(*indices_of_label))
label+=1
blobs.append(blob)
if __name__ == "__main__":
print("\n\nBeginning extract_blobs_from_labelled_array timing\n")
print("Time taken:")
print(
timeit.timeit(
'extract_blobs_from_labelled_array(labels)',
globals=globals(),
number=1
)
)
print("\n\n")
Output:
2d array of type: <class 'numpy.ndarray'>:
[[0 1 0 0 1 1 0 1 1 0 0 1]
[0 1 0 1 1 1 0 1 1 1 0 1]
[0 0 0 0 0 0 0 1 1 1 0 0]
[0 1 1 1 1 0 0 0 0 1 0 0]
[0 0 0 0 0 0 0 1 1 1 0 0]
[0 0 1 0 0 0 0 0 0 0 0 0]
[0 1 0 0 1 1 0 1 1 0 0 1]
[0 0 0 0 0 0 0 1 1 1 0 0]
[0 1 1 1 1 0 0 0 0 1 0 0]]
2d array with connected blobs labelled of type <class 'numpy.ndarray'>:
[[ 0 1 0 0 2 2 0 3 3 0 0 4]
[ 0 1 0 2 2 2 0 3 3 3 0 4]
[ 0 0 0 0 0 0 0 3 3 3 0 0]
[ 0 5 5 5 5 0 0 0 0 3 0 0]
[ 0 0 0 0 0 0 0 3 3 3 0 0]
[ 0 0 6 0 0 0 0 0 0 0 0 0]
[ 0 6 0 0 7 7 0 8 8 0 0 9]
[ 0 0 0 0 0 0 0 8 8 8 0 0]
[ 0 10 10 10 10 0 0 0 0 8 0 0]]
Beginning extract_blobs_from_labelled_array timing
Time taken:
9.346099977847189e-05
9e-05 is small but so is this image for the example. In reality I am working with very high resolution images for which the function takes approximately 10 minutes.
Is there a faster way to do this?
Side note: I'm only using list(zip()) to try get the numpy coordinates into something I'm used to (I don't use numpy much just Python). Should I be skipping this and just using the coordinates to index as-is? Will that speed it up?
The part of the code that slow is here:
while True:
indices_of_label = np.where(labelled_array==label)
if not indices_of_label[0].size > 0:
break
else:
blob =list(zip(*indices_of_label))
label+=1
blobs.append(blob)
First, a complete aside: you should avoid using while True when you know the number of elements you will be iterating over. It's a recipe for hard-to-find infinite-loop bugs.
Instead, you should use:
for label in range(np.max(labels)):
and then you can ignore the if ...: break.
A second issue is indeed that you are using list(zip(*)), which is slow compared to NumPy functions. Here you could get approximately the same result with np.transpose(indices_of_label), which will get you a 2D array of shape (n_coords, n_dim), ie (n_coords, 2).
But the Big Issue is the expression labelled_array == label. This will examine every pixel of the image once for every label. (Twice, actually, because then you run np.where(), which takes another pass.) This is a lot of unnecessary work, as the coordinates can be found in one pass.
The scikit-image function skimage.measure.regionprops can do this for you. regionprops goes over the image once and returns a list containing one RegionProps object per label. The object has a .coords attribute containing the coordinates of each pixel in the blob. So, here's your code, modified to use that function:
import timeit
from skimage import measure
import numpy as np
binary_image = np.array([
[0,1,0,0,1,1,0,1,1,0,0,1],
[0,1,0,1,1,1,0,1,1,1,0,1],
[0,0,0,0,0,0,0,1,1,1,0,0],
[0,1,1,1,1,0,0,0,0,1,0,0],
[0,0,0,0,0,0,0,1,1,1,0,0],
[0,0,1,0,0,0,0,0,0,0,0,0],
[0,1,0,0,1,1,0,1,1,0,0,1],
[0,0,0,0,0,0,0,1,1,1,0,0],
[0,1,1,1,1,0,0,0,0,1,0,0],
])
print(f"\n\n2d array of type: {type(binary_image)}:")
print(binary_image)
labels = measure.label(binary_image)
print(f"\n\n2d array with connected blobs labelled of type {type(labels)}:")
print(labels)
def extract_blobs_from_labelled_array(labelled_array):
"""Return a list containing coordinates of pixels in each blob."""
props = measure.regionprops(labelled_array)
blobs = [p.coords for p in props]
return blobs
if __name__ == "__main__":
print("\n\nBeginning extract_blobs_from_labelled_array timing\n")
print("Time taken:")
print(
timeit.timeit(
'extract_blobs_from_labelled_array(labels)',
globals=globals(),
number=1
)
)
print("\n\n")
I have 2 matrices, and I want to perform a 'cell-wise' addition, however the matrices aren't the same size. I want to preserve the cells relative positions during the calculation (i.e. their 'co-ordinates' from the top left), so a simple (if maybe not the best) solution, seems to be to pad the smaller matrix's x and y with zeros.
This thread has a perfectly satisfactory answer for concatenating vertically, and this does work with my data, and following the suggestion in the answer, I also threw in the hstack but at the moment, it's complaining that the dimensions (excluding concatenation axis) need to match exactly. Perhaps hstack doesnt work as I anticipate or exactly equivalently to vstack, but I'm at a bit of a loss now.
This is what hstack throws at me, meanwhile vstack seems to have no problem.
ValueError: all the input array dimensions except for the concatenation axis must match exactly
Essentially the code checks which of a pair of matrices is the shorter and/or wider, and then pads the smaller matrix with zeros to match.
Here's the code I have:
import numpy as np
A = np.random.randint(2, size = (3, 7))
B = np.random.randint(2, size = (5, 10))
# If the arrays have different row numbers:
if A.shape[0] < B.shape[0]: # Is A shorter than B?
A = np.vstack((A, np.zeros((B.shape[0] - A.shape[0], A.shape[1]))))
elif A.shape[0] > B.shape[0]: # or is A longer than B?
B = np.vstack((B, np.zeros((A.shape[0] - B.shape[0], B.shape[1]))))
# If they have different column numbers
if A.shape[1] < B.shape[1]: # Is A narrower than B?
A = np.hstack((A, np.zeros((B.shape[1] - A.shape[1], A.shape[0]))))
elif A.shape[1] > B.shape[1]: # or is A wider than B?
B = np.hstack((B, np.zeros((A.shape[1] - B.shape[1], B.shape[0]))))
It's getting late so its possible I've just missed something obvious with hstack but I can't see my logic error at the moment.
Just use np.pad :
np.pad(A,((0,2),(0,3)),'constant') # 2 is 5-3, 3 is 10-7
[[0 1 1 0 1 0 0 0 0 0]
[1 0 0 1 0 1 0 0 0 0]
[1 0 1 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]]
But the 4 pads width must be computed; so an another simple
method to pad the 2 array in any case is :
A = np.ones((3, 7),int)
B = np.ones((5, 2),int)
ma,na = A.shape
mb,nb = B.shape
m,n = max(ma,mb) , max(na,nb)
newA = np.zeros((m,n),A.dtype)
newA[:ma,:na]=A
newB = np.zeros((m,n),B.dtype)
newB[:mb,:nb]=B
For :
[[1 1 1 1 1 1 1]
[1 1 1 1 1 1 1]
[1 1 1 1 1 1 1]
[0 0 0 0 0 0 0]
[0 0 0 0 0 0 0]]
[[1 1 0 0 0 0 0]
[1 1 0 0 0 0 0]
[1 1 0 0 0 0 0]
[1 1 0 0 0 0 0]
[1 1 0 0 0 0 0]]
I think your hstack lines should be of the form
np.hstack((A, np.zeros((A.shape[0], B.shape[1] - A.shape[1]))))
You seem to have the rows and columns swapped.
Yes, indeed. You should swap (B.shape[1] - A.shape[1], A.shape[0]) to (A.shape[0], B.shape[1] - A.shape[1]) and so on, because you need to have the same numbers of rows to stack them horizontally.
Try b[:a.shape[0], :a.shape[1]] = b[:a.shape[0], :a.shape[1]]+a where b the larger array
Example below
import numpy as np
a = np.arange(12).reshape(3, 4)
print("a\n", a)
b = np.arange(16).reshape(4, 4)
print("b original\n", b)
b[:a.shape[0], :a.shape[1]] = b[:a.shape[0], :a.shape[1]]+a
print("b new\n",b)
output
a
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
b original
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]]
b new
[[ 0 2 4 6]
[ 8 10 12 14]
[16 18 20 22]
[12 13 14 15]]
I have a 2D labeled image (numpy array), each label represents an object. I have to find the object's center and its area. My current solution:
centers = [np.mean(np.where(label_2d == i),1) for i in range(1,num_obj+1)]
surface_area = np.array([np.sum(label_2d == i) for i in range(1,num_obj+1)])
Note that label_2d used for centers is not the same as the one for surface area, so I can't combine both operations. My current code is about 10-100 times to slow.
In C++ I would iterate through the image once (2 for loops) and fill the table (an array), from which I would than calculate centers and surface area.
Since for loops are quite slow in python, I have to find another solution. Any advice?
You could use the center_of_mass function present in scipy.ndimage.measurements for the first problem and then use np.bincount for the second problem. Because these are in the mainstream libraries, they will be heavily optimized, so you can expect decent speed gains.
Example:
>>> import numpy as np
>>> from scipy.ndimage.measurements import center_of_mass
>>>
>>> a = np.zeros((10,10), dtype=np.int)
>>> # add some labels:
... a[3:5, 1:3] = 1
>>> a[7:9, 0:3] = 2
>>> a[5:6, 4:9] = 3
>>> print(a)
[[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 1 1 0 0 0 0 0 0 0]
[0 1 1 0 0 0 0 0 0 0]
[0 0 0 0 3 3 3 3 3 0]
[0 0 0 0 0 0 0 0 0 0]
[2 2 2 0 0 0 0 0 0 0]
[2 2 2 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]]
>>>
>>> num_obj = 3
>>> surface_areas = np.bincount(a.flat)[1:]
>>> centers = center_of_mass(a, labels=a, index=range(1, num_obj+1))
>>> print(surface_areas)
[4 6 5]
>>> print(centers)
[(3.5, 1.5), (7.5, 1.0), (5.0, 6.0)]
Speed gains depend on the size of your input data though, so I can't make any serious estimates on that. Would be nice if you could add that info (size of a, number of labels, timing results for the method you used and these functions) in the comments.
Past midnight and maybe someone has an idea how to tackle a problem of mine. I want to count the number of adjacent cells (which means the number of array fields with other values eg. zeroes in the vicinity of array values) as sum for each valid value!.
Example:
import numpy, scipy
s = ndimage.generate_binary_structure(2,2) # Structure can vary
a = numpy.zeros((6,6), dtype=numpy.int) # Example array
a[2:4, 2:4] = 1;a[2,4] = 1 # with example value structure
print a
>[[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 1 1 1 0]
[0 0 1 1 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]]
# The value at position [2,4] is surrounded by 6 zeros, while the one at
# position [2,2] has 5 zeros in the vicinity if 's' is the assumed binary structure.
# Total sum of surrounding zeroes is therefore sum(5+4+6+4+5) == 24
How can i count the number of zeroes in such way if the structure of my values vary?
I somehow believe to must take use of the binary_dilation function of SciPy, which is able to enlarge the value structure, but simple counting of overlaps can't lead me to the correct sum or does it?
print ndimage.binary_dilation(a,s).astype(a.dtype)
[[0 0 0 0 0 0]
[0 1 1 1 1 1]
[0 1 1 1 1 1]
[0 1 1 1 1 1]
[0 1 1 1 1 0]
[0 0 0 0 0 0]]
Use a convolution to count neighbours:
import numpy
import scipy.signal
a = numpy.zeros((6,6), dtype=numpy.int) # Example array
a[2:4, 2:4] = 1;a[2,4] = 1 # with example value structure
b = 1-a
c = scipy.signal.convolve2d(b, numpy.ones((3,3)), mode='same')
print numpy.sum(c * a)
b = 1-a allows us to count each zero while ignoring the ones.
We convolve with a 3x3 all-ones kernel, which sets each element to the sum of it and its 8 neighbouring values (other kernels are possible, such as the + kernel for only orthogonally adjacent values). With these summed values, we mask off the zeros in the original input (since we don't care about their neighbours), and sum over the whole array.
I think you already got it. after dilation, the number of 1 is 19, minus 5 of the starting shape, you have 14. which is the number of zeros surrounding your shape. Your total of 24 has overlaps.