Related
So I have a program that at some point creates random arrays and I have perform an operation which is to add rows while replacing other rows based on the values found in the rows. One of the random arrays will look something like this but keep in mind that it could randomly vary in size ranging from 3x3 up to 10x10:
0 2 0 1
1 0 0 1
1 0 2 1
2 0 1 2
For every row that has at least one value equal to 2 I need to remove/replace the row and add some more rows. The number of rows added will depend on the number of combinations possible of 0s and 1s where the number of digits is equal to the number of 2s counted in each row. Each added row will introduce one of these combinations in the positions where the 2s are located. The result that I'm looking for will look like this:
0 1 0 1 # First combination to replace 0 2 0 1
0 0 0 1 # Second combination to replace 0 2 0 1 (Only 2 combinations, only one 2)
1 0 0 1 # Stays the same
1 0 1 1 # First combination to replace 1 0 2 1
1 0 0 1 # Second combination to replace 1 0 2 1 (Only 2 combinations, only one 2)
0 0 1 0 # First combination to replace 2 0 1 2
0 0 1 1 # Second combination to replace 2 0 1 2
1 0 1 1 # Third combination to replace 2 0 1 2
1 0 1 0 # Fourth combination to replace 2 0 1 2 (4 combinations, there are two 2s)
If you know a Numpy way of accomplishing this I will be grateful.
You can try the following. Create a sample array:
import numpy as np
np.random.seed(5)
a = np.random.randint(0, 3, (4, 4))
print(a)
This gives:
[[2 1 2 2]
[0 1 0 0]
[2 0 2 0]
[0 1 1 0]]
Compute the output array:
ts = (a == 2).sum(axis=1)
r = np.hstack([np.array(np.meshgrid(*[[0, 1]] * t)).reshape(t, -1).T.ravel() for t in ts if t])
out = np.repeat(a, 2**ts, axis=0)
out[out == 2] = r
print(out)
Result:
[[0 1 0 0]
[0 1 0 1]
[1 1 0 0]
[1 1 0 1]
[0 1 1 0]
[0 1 1 1]
[1 1 1 0]
[1 1 1 1]
[0 1 0 0]
[0 0 0 0]
[1 0 0 0]
[0 0 1 0]
[1 0 1 0]
[0 1 1 0]]
Not the prettiest code but it does the job. You could clean up the itertools calls but this lets you see how it works.
import numpy as np
import itertools
X = np.array([[0, 2, 0, 1],
[1, 0, 0, 1],
[1, 0, 2, 1],
[2, 0, 1, 2]])
def add(X_,Y):
if Y.size == 0:
Y = X_
else:
Y = np.vstack((Y, X_))
return(Y)
Y = np.array([])
for i in range(len(X)):
if 2 not in X[i,:]:
Y = add(X[i,:], Y)
else:
a = np.where(X[i,:]==2)[0]
n = [[i for i in itertools.chain([1, 0])] for _ in range(len(a))]
m = list(itertools.product(*n))
for j in range(len(m)):
M = 1 * X[i,:]
u = list(m[j])
for k in range(len(a)):
M[a[k]] = u[k]
Y = add(M, Y)
print(Y)
#[[0 1 0 1]
# [0 0 0 1]
# [1 0 0 1]
# [1 0 1 1]
# [1 0 0 1]
# [1 0 1 1]
# [1 0 1 0]
# [0 0 1 1]
# [0 0 1 0]]
I'm struggling when writing a function that would seemlessly apply to any numpy arrays whatever its dimension.
At one point in my code, I have boolean arrays that I consider as mask for other arrays (0 = not passing, 1 = passing).
I would like to "enlarge" those mask arrays by overriding zeros adjacent to ones on a defined range.
Example :
input = [0,0,0,0,0,1,0,0,0,0,1,0,0,0]
enlarged_by_1 = [0,0,0,0,1,1,1,0,0,1,1,1,0,0]
enlarged_by_2 = [0,0,0,1,1,1,1,1,1,1,1,1,1,0]
input = [[0,0,0,1,0,0,1,0],
[0,1,0,0,0,0,0,0],
[0,0,0,0,0,0,1,0]]
enlarged_by_1 = [[0,0,1,1,1,1,1,1],
[1,1,1,0,0,0,0,0],
[0,0,0,0,0,1,1,1]]
This is pretty straighforward when inputs are 1D.
However, I would like this function to take seemlessy 1D, matrix, 3D, and so on.
So for a matrix, the same logic would be applied to each lines.
I read about ellipsis, but it does not seem to be applicable in my case.
Flattening the input applying the logic and reshaping the array would lead to possible contamination between individual arrays.
I do not want to go through testing the shape of input numpy array / recursive function as it does not seems very clean to me.
Would you have some suggestions ?
The operation that you are described seems very much like a convolution operation followed by clipping to ensure that values remain 0 or 1.
For your example input:
import numpy as np
input = np.array([0,0,0,0,0,1,0,0,0,0,1,0,0,0], dtype=int)
print(input)
def enlarge_ones(x, k):
mask = np.ones(2*k+1, dtype=int)
return np.clip(np.convolve(x, mask, mode='same'), 0, 1).astype(int)
print(enlarge_ones(input, k=1))
print(enlarge_ones(input, k=3))
which yields
[0 0 0 0 0 1 0 0 0 0 1 0 0 0]
[0 0 0 0 1 1 1 0 0 1 1 1 0 0]
[0 0 1 1 1 1 1 1 1 1 1 1 1 1]
numpy.convolve only works for 1-d arrays. However, one can imagine a for loop over the number of array dimensions and another for loop over each array. In other words, for a 2-d matrix first operate on every row and then on every column. You get the idea for nd-array with more dimensions. In other words the enlarge_ones would become something like:
def enlarge_ones(x, k):
n = len(x.shape)
if n == 1:
mask = np.ones(2*k+1, dtype=int)
return np.clip(np.convolve(x, mask, mode='same')[:len(x)], 0, 1).astype(int)
else:
x = x.copy()
for d in range(n):
for i in np.ndindex(x.shape[:-1]):
x[i] = enlarge_ones(x[i], k) # x[i] is 1-d
x = x.transpose(list(range(1, n)) + [0])
return x
Note the use of np.transpose to rotate the dimensions so that np.convolve is applied to the 1-d along each dimension. This is exactly n times, which returns the matrix to original shape at the end.
x = np.zeros((3, 5, 7), dtype=int)
x[1, 2, 2] = 1
print(x)
print(enlarge_ones(x, k=1))
[[[0 0 0 0 0 0 0]
[0 0 0 0 0 0 0]
[0 0 0 0 0 0 0]
[0 0 0 0 0 0 0]
[0 0 0 0 0 0 0]]
[[0 0 0 0 0 0 0]
[0 0 0 0 0 0 0]
[0 0 1 0 0 0 0]
[0 0 0 0 0 0 0]
[0 0 0 0 0 0 0]]
[[0 0 0 0 0 0 0]
[0 0 0 0 0 0 0]
[0 0 0 0 0 0 0]
[0 0 0 0 0 0 0]
[0 0 0 0 0 0 0]]]
[[[0 0 0 0 0 0 0]
[0 1 1 1 0 0 0]
[0 1 1 1 0 0 0]
[0 1 1 1 0 0 0]
[0 0 0 0 0 0 0]]
[[0 0 0 0 0 0 0]
[0 1 1 1 0 0 0]
[0 1 1 1 0 0 0]
[0 1 1 1 0 0 0]
[0 0 0 0 0 0 0]]
[[0 0 0 0 0 0 0]
[0 1 1 1 0 0 0]
[0 1 1 1 0 0 0]
[0 1 1 1 0 0 0]
[0 0 0 0 0 0 0]]]
I am looking for the coordinates of connected blobs in a binary image (2d numpy array of 0 or 1).
The skimage library provides a very fast way to label blobs within the array (which I found from similar SO posts). However I want a list of the coordinates of the blob, not a labelled array. I have a solution which extracts the coordinates from the labelled image. But it is very slow. Far slower than the inital labelling.
Minimal Reproducible example:
import timeit
from skimage import measure
import numpy as np
binary_image = np.array([
[0,1,0,0,1,1,0,1,1,0,0,1],
[0,1,0,1,1,1,0,1,1,1,0,1],
[0,0,0,0,0,0,0,1,1,1,0,0],
[0,1,1,1,1,0,0,0,0,1,0,0],
[0,0,0,0,0,0,0,1,1,1,0,0],
[0,0,1,0,0,0,0,0,0,0,0,0],
[0,1,0,0,1,1,0,1,1,0,0,1],
[0,0,0,0,0,0,0,1,1,1,0,0],
[0,1,1,1,1,0,0,0,0,1,0,0],
])
print(f"\n\n2d array of type: {type(binary_image)}:")
print(binary_image)
labels = measure.label(binary_image)
print(f"\n\n2d array with connected blobs labelled of type {type(labels)}:")
print(labels)
def extract_blobs_from_labelled_array(labelled_array):
# The goal is to obtain lists of the coordinates
# Of each distinct blob.
blobs = []
label = 1
while True:
indices_of_label = np.where(labelled_array==label)
if not indices_of_label[0].size > 0:
break
else:
blob =list(zip(*indices_of_label))
label+=1
blobs.append(blob)
if __name__ == "__main__":
print("\n\nBeginning extract_blobs_from_labelled_array timing\n")
print("Time taken:")
print(
timeit.timeit(
'extract_blobs_from_labelled_array(labels)',
globals=globals(),
number=1
)
)
print("\n\n")
Output:
2d array of type: <class 'numpy.ndarray'>:
[[0 1 0 0 1 1 0 1 1 0 0 1]
[0 1 0 1 1 1 0 1 1 1 0 1]
[0 0 0 0 0 0 0 1 1 1 0 0]
[0 1 1 1 1 0 0 0 0 1 0 0]
[0 0 0 0 0 0 0 1 1 1 0 0]
[0 0 1 0 0 0 0 0 0 0 0 0]
[0 1 0 0 1 1 0 1 1 0 0 1]
[0 0 0 0 0 0 0 1 1 1 0 0]
[0 1 1 1 1 0 0 0 0 1 0 0]]
2d array with connected blobs labelled of type <class 'numpy.ndarray'>:
[[ 0 1 0 0 2 2 0 3 3 0 0 4]
[ 0 1 0 2 2 2 0 3 3 3 0 4]
[ 0 0 0 0 0 0 0 3 3 3 0 0]
[ 0 5 5 5 5 0 0 0 0 3 0 0]
[ 0 0 0 0 0 0 0 3 3 3 0 0]
[ 0 0 6 0 0 0 0 0 0 0 0 0]
[ 0 6 0 0 7 7 0 8 8 0 0 9]
[ 0 0 0 0 0 0 0 8 8 8 0 0]
[ 0 10 10 10 10 0 0 0 0 8 0 0]]
Beginning extract_blobs_from_labelled_array timing
Time taken:
9.346099977847189e-05
9e-05 is small but so is this image for the example. In reality I am working with very high resolution images for which the function takes approximately 10 minutes.
Is there a faster way to do this?
Side note: I'm only using list(zip()) to try get the numpy coordinates into something I'm used to (I don't use numpy much just Python). Should I be skipping this and just using the coordinates to index as-is? Will that speed it up?
The part of the code that slow is here:
while True:
indices_of_label = np.where(labelled_array==label)
if not indices_of_label[0].size > 0:
break
else:
blob =list(zip(*indices_of_label))
label+=1
blobs.append(blob)
First, a complete aside: you should avoid using while True when you know the number of elements you will be iterating over. It's a recipe for hard-to-find infinite-loop bugs.
Instead, you should use:
for label in range(np.max(labels)):
and then you can ignore the if ...: break.
A second issue is indeed that you are using list(zip(*)), which is slow compared to NumPy functions. Here you could get approximately the same result with np.transpose(indices_of_label), which will get you a 2D array of shape (n_coords, n_dim), ie (n_coords, 2).
But the Big Issue is the expression labelled_array == label. This will examine every pixel of the image once for every label. (Twice, actually, because then you run np.where(), which takes another pass.) This is a lot of unnecessary work, as the coordinates can be found in one pass.
The scikit-image function skimage.measure.regionprops can do this for you. regionprops goes over the image once and returns a list containing one RegionProps object per label. The object has a .coords attribute containing the coordinates of each pixel in the blob. So, here's your code, modified to use that function:
import timeit
from skimage import measure
import numpy as np
binary_image = np.array([
[0,1,0,0,1,1,0,1,1,0,0,1],
[0,1,0,1,1,1,0,1,1,1,0,1],
[0,0,0,0,0,0,0,1,1,1,0,0],
[0,1,1,1,1,0,0,0,0,1,0,0],
[0,0,0,0,0,0,0,1,1,1,0,0],
[0,0,1,0,0,0,0,0,0,0,0,0],
[0,1,0,0,1,1,0,1,1,0,0,1],
[0,0,0,0,0,0,0,1,1,1,0,0],
[0,1,1,1,1,0,0,0,0,1,0,0],
])
print(f"\n\n2d array of type: {type(binary_image)}:")
print(binary_image)
labels = measure.label(binary_image)
print(f"\n\n2d array with connected blobs labelled of type {type(labels)}:")
print(labels)
def extract_blobs_from_labelled_array(labelled_array):
"""Return a list containing coordinates of pixels in each blob."""
props = measure.regionprops(labelled_array)
blobs = [p.coords for p in props]
return blobs
if __name__ == "__main__":
print("\n\nBeginning extract_blobs_from_labelled_array timing\n")
print("Time taken:")
print(
timeit.timeit(
'extract_blobs_from_labelled_array(labels)',
globals=globals(),
number=1
)
)
print("\n\n")
I would like to replace all values of the subdiagonals under the k-diagonal.
For example :
We first import the numpy library :
import numpy as np
Then we create the matrix :
In [14]: matrix = np.matrix('1 1 1 1 1 1; 1 1 1 1 1 1; 1 1 1 1 1 1; 1 1 1 1 1 1; 1 1 1 1 1 1')
We are then getting :
In [15]: print(matrix)
Out[16]:
[[1 1 1 1 1 1]
[1 1 1 1 1 1]
[1 1 1 1 1 1]
[1 1 1 1 1 1]
[1 1 1 1 1 1]]
We then get the diagonals under the k-diagonal for k = 1 for example :
In [17]: lowerdiags = [np.diag(matrix, k=e+1).tolist() for e in range(-len(matrix), k)]
In [18]: print(lowerdiags)
Out[19]: [[1], [1, 1], [1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1]]
And, I'm stuck there, what should I add for it to be for k = 1 and replace all values per 0, like that: (Knowing that we just found the subdiagonals)
[[0 1 1 1 1 1]
[0 0 1 1 1 1]
[0 0 0 1 1 1]
[0 0 0 0 1 1]
[0 0 0 0 0 1]]
or even for k = 0 :
[[1 1 1 1 1 1]
[0 1 1 1 1 1]
[0 0 1 1 1 1]
[0 0 0 1 1 1]
[0 0 0 0 1 1]]
Thank you for your help and your patience.
I found a way by using the numpy method : fill_diagonal and by moving around the different k :
# Import numpy library
import numpy as np
def Exercise_3(matrix, k):
# print initial matrix
print(matrix)
for k in range(-len(matrix)+1, k):
if k < 0:
# Smart slicing when filling diagonals with "np.fill_diagonal" on our matrix for lower diagonals
np.fill_diagonal(matrix[-k:, :k], 0)
if k > 0:
# Smart slicing when filling diagonals with "np.fill_diagonal" on our matrix for upper diagonals
np.fill_diagonal(matrix[:-k, k:], 0)
if k == 0:
# Just replace the main diagonal by 0
np.fill_diagonal(matrix, 0)
# print to see each change on the matrix
#print(matrix)
#print(k)
return matrix
def main():
k = 0
# an another way of creating a matrix
#matrix = np.matrix('1 1 1 1 1 1; 1 1 1 1 1 1; 1 1 1 1 1 1; 1 1 1 1 1 1; 1 1 1 1 1 1; 1 1 1 1 1 1')
# matrix of 5 rows and 5 columns filled by 1
matrix = np.array(([1,1,1,1,1],[1,1,1,1,1],[1,1,1,1,1],[1,1,1,1,1],[1,1,1,1,1]))
NewMatrix = Exercise_3(matrix, k)
print(NewMatrix)
main()
Numpy Three Four Five Dimensional Array in Python
Input 1: 3
Output 1:
[[0 1 0]
[1 1 1]
[0 1 0]]
Input 2:5
Output 1:
[[0 0 1 0 0]
[0 0 1 0 0]
[1 1 1 1 1]
[0 0 1 0 0]
[0 0 1 0 0]]
Notice that the 1s in the arrays make a shape like +.
My logic is shown below
a=np.zeros((n,n),dtype='int')
a[-3,:] = 1
a[:,-3] = 1 print(a)
This logic is only working for five dimensional array but not for three dimensional array.
can someone assist me to get the expected output for both three and five dimensional array using np.zeros & integer division //
As you can see, n//2 = 3 when n=5. So, that's the solution to your question as see here:
import numpy as np
def create_plus_matrix(n):
a = np.zeros((n,n),dtype='int')
a[-n//2,:] = 1
a[:,-n//2] = 1
return a
So, let's try it out:
>>> create_plus_matrix(3)
[[0 1 0]
[1 1 1]
[0 1 0]]
>> create_plus_matrix(5)
[[0 0 1 0 0]
[0 0 1 0 0]
[1 1 1 1 1]
[0 0 1 0 0]
[0 0 1 0 0]]
Do this
import numpy as np
def plus(size):
a = np.zeros([size,size], dtype = int)
a[int(size/2)] = np.ones(size)
for i in a:
i[int(size/2)] = 1
return a
print(plus(3)) //3 is the size
//Output
[[0 1 0]
[1 1 1]
[0 1 0]]