Related
I would like to know the fastest way to extract the indices of the first n non zero values per column in a 2D array.
For example, with the following array:
arr = [
[4, 0, 0, 0],
[0, 0, 0, 0],
[0, 4, 0, 0],
[2, 0, 9, 0],
[6, 0, 0, 0],
[0, 7, 0, 0],
[3, 0, 0, 0],
[1, 2, 0, 0],
With n=2 I would have [0, 0, 1, 1, 2] as xs and [0, 3, 2, 5, 3] as ys. 2 values in the first and second columns and 1 in the third.
Here is how it is currently done:
x = []
y = []
n = 3
for i, c in enumerate(arr.T):
a = c.nonzero()[0][:n]
if len(a):
x.extend([i]*len(a))
y.extend(a)
In practice I have arrays of size (405, 256).
Is there a way to make it faster?
Here is a method, although quite confusing as it uses a lot of functions, that does not require sorting the array (only a linear scan is necessary to get non null values):
n = 2
# Get indices with non null values, columns indices first
nnull = np.stack(np.where(arr.T != 0))
# split indices by unique value of column
cols_ids= np.array_split(range(len(nnull[0])), np.where(np.diff(nnull[0]) > 0)[0] +1 )
# Take n in each (max) and concatenate the whole
np.concatenate([nnull[:, u[:n]] for u in cols_ids], axis = 1)
outputs:
array([[0, 0, 1, 1, 2],
[0, 3, 2, 5, 3]], dtype=int64)
Here is one approach using argsort, it gives a different order though:
n = 2
m = arr!=0
# non-zero values first
idx = np.argsort(~m, axis=0)
# get first 2 and ensure non-zero
m2 = np.take_along_axis(m, idx, axis=0)[:n]
y,x = np.where(m2)
# slice
x, idx[y,x]
# (array([0, 1, 2, 0, 1]), array([0, 2, 3, 3, 5]))
Use dislocation comparison for the row results of the transposed nonzero:
>>> n = 2
>>> i, j = arr.T.nonzero()
>>> mask = np.concatenate([[True] * n, i[n:] != i[:-n]])
>>> i[mask], j[mask]
(array([0, 0, 1, 1, 2], dtype=int64), array([0, 3, 2, 5, 3], dtype=int64))
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
This post was edited and submitted for review 1 year ago and failed to reopen the post:
Original close reason(s) were not resolved
Improve this question
Given some m by n grid of 1's and 0's, how would you find how much water would be captured by it, where the 1's are 'walls', and 0's are empty space?
Examples:
[1, 1, 1, 1, 1],
[1, 0, 0, 0, 1],
[1, 0, 0, 0, 1],
[1, 0, 0, 0, 1],
[1, 1, 1, 1, 1]
This grid would capture 9 units of water.
[1, 1, 1, 0, 1],
[1, 0, 0, 0, 1],
[1, 0, 0, 0, 1],
[1, 0, 0, 0, 1],
[1, 1, 1, 1, 1]
However, because this grid has a 'leak' in one of its walls, this would capture 0 units of water.
[1, 1, 1, 0, 1],
[1, 0, 1, 0, 1],
[1, 0, 1, 0, 1],
[1, 0, 1, 0, 1],
[1, 1, 1, 1, 1]
Likewise, because there is a partition between the two sections, the leaky one does not affect the other, and as such this grid would capture 3 units of water.
I'm just really uncertain of how to start on this problem. Are there any algorithms that would be helpful for this? I was thinking depth-first-search or some sort of flood-fill, but now I'm not sure if those are applicable to this exercise.
You can create a list of leaks starting from the positions of 0s on the edges. Then expand that list with 0s that are next to the leaking positions (until no more leaks can be added). Finally, subtract the number of leaks from the total number of zeros in the grid.
def water(G):
rows = len(G)
cols = len(G[0])
# initial leaks are 0s on edges
leaks = [ (r,c) for r in range(rows) for c in range(cols)
if G[r][c]==0 and (r==0 or c==0 or r==rows-1 or c==cols-1) ]
for r,c in leaks:
for dr,dc in [(-1,0),(1,0),(0,-1),(0,1)]: # offsets of neighbours
nr,nc = r+dr, c+dc # coordinates of a neighbour
if nr not in range(rows): continue # out of bounds
if nc not in range(cols): continue # out of bounds
if G[nr][nc] != 0: continue # Wall
if (nr,nc) in leaks: continue # already known
leaks.append((nr,nc)) # add new leak
return sum( row.count(0) for row in G) - len(leaks)
Output:
grid = [[1, 1, 1, 1, 1],
[1, 0, 0, 0, 1],
[1, 0, 0, 0, 1],
[1, 0, 0, 0, 1],
[1, 1, 1, 1, 1]]
print(water(grid)) # 9
grid = [[1, 1, 1, 0, 1],
[1, 0, 0, 0, 1],
[1, 0, 0, 0, 1],
[1, 0, 0, 0, 1],
[1, 1, 1, 1, 1]]
print(water(grid)) # 0
grid = [[1, 1, 1, 0, 1],
[1, 0, 1, 0, 1],
[1, 0, 1, 0, 1],
[1, 0, 1, 0, 1],
[1, 1, 1, 1, 1]]
print(water(grid)) # 3
Note that this only looks for leaks in horizontal and vertical (but not diagonal) directions. To manage leaking through diagonals, you'll need to add (-1,-1),(-1,1),(1,-1),(1,1) to the list of offsets.
Removing zeros starting at the edges, representing the coordinates of zeros with a set (for fast lookup) of complex numbers (for easy neighbor calculation):
def water(G):
m, n = len(G), len(G[0])
zeros = {complex(i, j)
for i in range(m) for j in range(n)
if G[i][j] == 0}
for z in list(zeros):
if z.real in (0, m-1) or z.imag in (0, n-1):
q = [z]
for z in q:
if z in zeros:
zeros.remove(z)
for a in range(4):
q.append(z + 1j**a)
return len(zeros)
Or with Alain's style of a single BFS, initializing the queue with all edge zeros:
def water(G):
m, n = len(G), len(G[0])
zeros = {complex(i, j)
for i in range(m) for j in range(n)
if G[i][j] == 0}
q = [z for z in zeros
if z.real in (0, m-1) or z.imag in (0, n-1)]
for z in q:
if z in zeros:
zeros.remove(z)
for a in range(4):
q.append(z + 1j**a)
return len(zeros)
I have the following task to solve.
I have an image (numpy array) where everything that is not the main object is 0 and the main object has some pixel counts all around (let's set all of them to 1).
What I need is to get the number of all the pixels on the contour (red squares with 1 as the value) of this object. The objects can have different forms.
Is there any way to achieve it?
OBS: The goal is to have a method that would be able to adapt to the shape of the figure, because it would be run on multiple images simultaneously.
I propose a similar solution to #user2640045 using convolution.
We can slide a filter over the array that counts the number of neighbours (left, right, top, bottom):
import numpy as np
from scipy import signal
a = np.array(
[
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0],
[0, 0, 1, 1, 1, 0, 0],
[0, 1, 1, 1, 1, 1, 0],
[0, 0, 1, 1, 1, 0, 0],
[0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
]
)
filter = np.array([[0, 1, 0],
[1, 0, 1],
[0, 1, 0]])
Now we convolve the image array with the filter and :
conv = signal.convolve2d(a, filter, mode='same')
Every element that has more than zero and less than four neighbors while being active itself is a boundary element:
bounds = a * np.logical_and(conv > 0, conv < 4)
We can apply this mask to get the boundary pixels and sum them up:
>>> a[bounds].sum()
8
Here are 2 example inputs:
This is interesting and I got an elegant solution for you.
Since we can agree that contour is defined as np.array value that is greater than 0 and have at least 1 neighbor with a value of 0 we can solve it pretty stright forward and make sure it is ready for every single image you will get for life (in an Numpy array, of course...)
import numpy as np
image_pxs = np.array([[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0],
[0, 0, 1, 1, 1, 0, 0],
[0, 1, 1, 1, 1, 1, 0],
[0, 0, 1, 1, 1, 0, 0],
[0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0]])
def get_contour(two_d_arr):
contour_pxs = 0
# Iterate of np:
for i, row in enumerate(two_d_arr):
for j, pixel in enumerate(row):
# Check neighbors
up = two_d_arr[i-1][j] == 0 if i > 0 else True
down = two_d_arr[i+1][j] == 0 if i < len(image_pxs)-1 else True
left = two_d_arr[i][j-1] == 0 if j > 0 else True
right = two_d_arr[i][j+1] == 0 if j < len(row)-1 else True
# Count distinct neighbors (empty / not empty)
sub_contour = len(list(set([up, down, left, right])))
# If at least 1 neighbor is empty and current value > 0 it is the contour
if sub_contour > 1 and pixel > 0:
# Add the number of pixels in i, j
contour_pxs += pixel
return contour_pxs
print(get_contour(image_pxs))
The output is of course 8:
8
[Finished in 97ms]
I have a tensor filled with 0 and 1. Now I want to randomly choose e.g. 50% of the elements which are equal to one. How do I do that?
For example I have the following tensor:
tensor = tf.constant([[0, 0, 1], [0, 1, 0], [1, 1, 0]])
Now I want to randomly choose the coordinates of 50% of the elements which are equal to one (in this case, I want to choose 2 elements out of the 4). The resulting tensors could look like follows:
[[0, 0, 1], [0, 0, 0], [0, 1, 0]]
You can use numpy.
import numpy as np
tensor = np.array([0, 1, 0, 1, 0, 1, 0, 1])
percentage = 0.5
ones_indices = np.where(tensor==1)
ones_length = np.shape(ones_indices)[1]
random_indices = np.random.permutation(ones_length)
ones_indices[0][random_indices][:int(ones_length * percentage)]
Edit: With your definition of a tensor I have adjusted the code:
import numpy as np
tensor = np.array([[0, 0, 1], [0, 1, 0], [1, 1, 0]])
percentage = 0.5
indices = np.where(tensor == 1)
length = np.shape(indices)[1]
random_idx = np.random.permutation(length)
random_idx = random_idx[:int(length * percentage)]
random_indices = (indices[0][random_idx], indices[1][random_idx])
z = np.zeros(np.shape(tensor), dtype=np.int64)
z[random_indices] = 1
# output
z
I wonder what is the best way to replaces rows that do not satisfy a certain condition with zeros for sparse matrices. For example (I use plain arrays for illustration):
I want to replace every row whose sum is greater than 10 with a row of zeros
a = np.array([[0,0,0,1,1],
[1,2,0,0,0],
[6,7,4,1,0], # sum > 10
[0,1,1,0,1],
[7,3,2,2,8], # sum > 10
[0,1,0,1,2]])
I want to replace a[2] and a[4] with zeros, so my output should look like this:
array([[0, 0, 0, 1, 1],
[1, 2, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 1, 1, 0, 1],
[0, 0, 0, 0, 0],
[0, 1, 0, 1, 2]])
This is fairly straight forward for dense matrices:
row_sum = a.sum(axis=1)
to_keep = row_sum >= 10
a[to_keep] = np.zeros(a.shape[1])
However, when I try:
s = sparse.csr_matrix(a)
s[to_keep, :] = np.zeros(a.shape[1])
I get this error:
raise NotImplementedError("Fancy indexing in assignment not "
NotImplementedError: Fancy indexing in assignment not supported for csr matrices.
Hence, I need a different solution for sparse matrices. I came up with this:
def zero_out_unfit_rows(s_mat, limit_row_sum):
row_sum = s_mat.sum(axis=1).T.A[0]
to_keep = row_sum <= limit_row_sum
to_keep = to_keep.astype('int8')
temp_diag = get_sparse_diag_mat(to_keep)
return temp_diag * s_mat
def get_sparse_diag_mat(my_diag):
N = len(my_diag)
my_diags = my_diag[np.newaxis, :]
return sparse.dia_matrix((my_diags, [0]), shape=(N,N))
This relies on the fact that if we set 2nd and 4th elements of the diagonal in the identity matrix to zero, then rows of the pre-multiplied matrix are set to zero.
However, I feel that there is a better, more scipynic, solution. Is there a better solution?
Not sure if it is very scithonic, but a lot of the operations on sparse matrices are better done by accessing the guts directly. For your case, I personally would do:
a = np.array([[0,0,0,1,1],
[1,2,0,0,0],
[6,7,4,1,0], # sum > 10
[0,1,1,0,1],
[7,3,2,2,8], # sum > 10
[0,1,0,1,2]])
sps_a = sps.csr_matrix(a)
# get sum of each row:
row_sum = np.add.reduceat(sps_a.data, sps_a.indptr[:-1])
# set values to zero
row_mask = row_sum > 10
nnz_per_row = np.diff(sps_a.indptr)
sps_a.data[np.repeat(row_mask, nnz_per_row)] = 0
# ask scipy.sparse to remove the zeroed entries
sps_a.eliminate_zeros()
>>> sps_a.toarray()
array([[0, 0, 0, 1, 1],
[1, 2, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 1, 1, 0, 1],
[0, 0, 0, 0, 0],
[0, 1, 0, 1, 2]])
>>> sps_a.nnz # it does remove the entries, not simply set them to zero
10