I have a numpy array where 0 denotes empty space and 1 denotes that a location is filled. I am trying to find a quick method of scanning the numpy array for where there are multiple values of zero adjacent to each other and return the location of the central zero.
For Example if I had the following array
[0 1 0 1]
[0 0 0 1]
[0 1 0 1]
[1 1 1 1]
I want to return the locations for which there is an adjacent zero on either side of a central zero
e.g
[1,1]
as this is the central of 3 zeros, i.e there is a zero either side of the zero at this location
Im aware that this can be calculated using if statements, but wondered if there was a more pythonic way of doing this.
Any help is greatly appreciated
The desired output here for arbitrary inputs is not exhaustively specified in the question, but here is a possible approach that might be useful for this kind of problem, and adapted to the details of the desired output. It uses np.cumsum, np.bincount, np.where, and np.median to find the middle index for groups of consecutive zeros along rows of a 2D array:
import numpy as np
def find_groups(x, min_size=3, value=0):
# Compute a sequential label for groups in each row.
xc = (x != value).cumsum(1)
# Count the number of occurances per group in each row.
counts = np.apply_along_axis(
lambda x: np.bincount(x, minlength=1 + xc.max()),
axis=1, arr=xc)
# Filter by minimum number of occurances.
i, j = np.where(counts >= min_size)
# Compute the median index of each group.
return [
(ii, int(np.ceil(np.median(np.where(xc[ii] == jj)[0]))))
for ii, jj in zip(i, j)
]
x = np.array([[0, 1, 0, 1],
[0, 0, 0, 1],
[0, 1, 0, 1],
[1, 1, 1, 1]])
print(find_groups(x))
# [(1, 1)]
It should work properly even for multiple rows with groups of varying sizes, and even multiple groups per row:
x2 = np.array([[0, 1, 0, 1, 1, 1, 1],
[0, 0, 0, 1, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0]])
print(find_groups(x2))
# [(1, 1), (1, 5), (2, 3), (3, 3)]
Related
The question is how to get matrix 'a' from 'sum' by using skylearn?
data_points = np.array([[1,2,0],[1,0,1],[0,1,1],[0,1,2],[1,1,0]])
centers = np.array([[0,0,0],[1,1,1],[1,1,0],[1,0,0]])
sum = np.sum((data_points[:, np.newaxis] - centers)**2,axis=2)
min = np.min(sum,axis=1)
a=np.array([[0,0,1,0],[0,1,0,0],[0,1,0,0],[0,1,0,0],[0,0,1,0]])
print("The question is how to get matrix 'a' from 'sum' by using skylearn?\n",a)
You can use numpy.argmin to get the index of the first min value:
my_sum = np.sum((data_points[:, np.newaxis] - centers)**2,axis=2)
idx_min = np.argmin(my_sum,axis=1)
# set up output
a = np.zeros(my_sum.shape, 'int')
a[np.arange(a.shape[0]), idx_min] = 1
NB. do not use sum and min as variable names, those are python builtins
output:
array([[0, 0, 1, 0],
[0, 1, 0, 0],
[0, 1, 0, 0],
[0, 1, 0, 0],
[0, 0, 1, 0]])
For a one dimensional numpy array of 1's and 0's, how can I effectively "mask" the array such that after the occurrence of a 1, the next n elements of the array are converted to zero. After the n elements have passed, the pattern repeats such that the first next occurrence of a 1 is preserved followed again by n zeros.
It is important that the first eligible occurrences of 1 are preserved, so a simple mask such as:
[true, false, false, true ...] won't work.
furthermore, the data set is massive so efficiency is important.
I've written crude python code to give me the desired results, but it is way too slow for what I need.
Here is an example:
data = [0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1]
n = 3
newData = []
tail = 0
for x in data:
if x == 1 and tail <= 0:
newData.append(1)
tail = n
else:
newData.append(0)
tail -= 1
print(newData)
newData: [0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1]
Is there possibly a vectorized numpy solution to this problem?
I'm processing tens of thousands of arrays, with more than a million elements in each array. So far using numpy functions has been the only way to manage this.
As far as I know, there is no option completely in numpy to do this. You could still use numpy to reduce the time for grabbing the indices, though.
data = [0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1]
n=3
def get_new_data(data,n):
new_data = np.zeros(len(data))
non_zero = np.argwhere(data).ravel()
idx = non_zero[0]
new_data[idx] =1
idx += n
for i in non_zero[1:]:
if i > idx:
new_data[i] = 1
idx+=n
return new_data
get_new_data(data, n)
A function like this should give you a better run time since you are not looping over the whole array.
If this is still not optimal to you, you can look at using numba, which works very well with numpy and is relatively easy to use.
You could do it like this:-
N = 3
data = [0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1]
newData = data.copy()
i = 0
M = [0 for _ in range(N)]
while i < len(newData) - N:
if newData[i] == 1:
newData[i + 1:i + 1 + N] = M
i += N
i += 1
print(newData)
Problem Statement
I am trying to write a function that would sparsify a matrix given a target sparsity and an argument called block_shape which defines the minimum size of zeros block in the matrix. The target doesn't have to be met perfectly, but as close as possible.
For example, given the following arguments,
>>> matrix = [
[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1]
]
>>> target = 0.5
>>> block_shape = (2, 2)
valid outputs of 50% sparsity could be
>>> sparse_matrix = sparsify(matrix, target, block_shape)
>>> sparse_matrix
[
[1, 1, 0, 0],
[1, 1, 0, 0],
[0, 0, 1, 1],
[0, 0, 1, 1]
]
>>> sparse_matrix = sparsify(matrix, target, block_shape)
>>> sparse_matrix
[
[1, 0, 0, 1],
[1, 0, 0, 1],
[0, 0, 1, 1],
[0, 0, 1, 1]
]
Note that there could be multiple valid sparsified versions of the input. The only criteris is to get to the target as much as possible. One of the constraints is that only the zeros of shape block_size are considered to be sparse.
For example, the matrix below has a sparsity level of 0%, given the arguments
>>> sparse_matrix = sparsify(matrix, target, block_shape)
>>> sparse_matrix
[
[1, 0, 0, 1],
[1, 1, 0, 0],
[0, 1, 1, 1],
[0, 0, 0, 0]
]
What I have so far
Currently, I have the following piece of code
import numpy as np
def sparsify(matrix, target, block_shape=None):
if block_shape is None or block_shape == 1 or block_shape == (1,) or block_shape == (1, 1):
# 1x1 is just bernoulli with p=target
probs = np.random.uniform(size=matrix.shape)
mask = np.zeros(matrix.shape)
mask[probs >= target] = 1.0
else:
if isinstance(block_shape, int):
block_shape = (block_shape, block_shape)
if len(block_shape) == 1:
block_shape = (block_shape[0], block_shape[0])
mask = np.ones(matrix.shape)
rows, cols = matrix.shape
for row in range(rows):
for col in range(cols):
submask = mask[row:row+block_shape[0], col:col+block_shape[1]]
if submask.shape != block_shape:
# we don't care about the edges, cannot partially sparsify
continue
if (submask == 0).any():
# If current (row, col) is already in the sparsified area, skip
continue
prob = np.random.random()
if prob < target:
submask[:, :] = np.zeros(submask.shape)
return matrix * mask, mask
The problem with the code above is that it does not match the target if the block size is not (1, 1)
>>> matrix = np.random.randn(100, 100)
>>> matrix, mask = sparsify(matrix, target=0.5, block_shape=(2, 2))
>>> print((matrix == 0).mean())
0.73
>>> print((mask == 0).mean())
0.73
Reason for discrepancy (I think)
I am not sure why I am not getting the target I expect, but I think it has something to do with the fact that I check the probability of every element, instead of the block as a whole. However, I have skipping conditions in my code, so I thought that should cover it
Edits
Edit 1 -- additional examples
Just giving some more examples.
Example 1: Given different block size
>>> sparse_matrix = sparsify(matrix, 0.25, (3, 3))
>>> sparse_matrix
[
[0, 0, 0, 1],
[0, 0, 0, 1],
[0, 0, 0, 1],
[1, 1, 1, 1]
]
The example above is a valid sparse matrix, although the level of sparsity is not 25%, another valid result could be a matrix of all 1's.
Example 2: Given a different block size and target
>>> sparse_matrix = sparsify(matrix, 0.6, (1, 2))
>>> sparse_matrix
[
[0, 0, 0, 0],
[1, 0, 0, 1],
[0, 0, 1, 1],
[1, 1, 0, 0]
]
Notice that all zeroes can be put in blocks of (1, 2), and the sparsity level = 60%
Edit 2 -- forgot a constraint
Another constraint that I forgot to mention, but tried incorporating into my code is that the zero blocks must be non-overlapping.
Example 1: The result below is NOT valid
>>> sparse_matrix = sparsify(matrix, 0.5, (2, 2))
>>> sparse_matrix
[
[0, 0, 1, 1],
[0, 0, 0, 1],
[1, 0, 0, 1],
[1, 1, 1, 1]
]
Although the blocks starting at index (0, 0) and (1, 1) have valid zero-shapes, the result does not meet the requirements. The reason is that only one of those blocks can be considered valid. if we label the zero blocks as z0 and z1, here is what this matrix is:
[
[z0, z0, 1, 1],
[z0, z0, z1, 1],
[ 1, z1, z1, 1],
[ 1, 1, 1, 1]
]
element at (1, 1) can be treated as belonging to z0 or z1. That means that there is only one sparse block, which makes the level of sparsity at 25% (not ~44%).
The probability of becoming 0 is not all equal.
For example: block_shape (2, 2), matrix(0, 0) becoming 0 has probability of target since the loop only passes through once. matrix(1, 0) has probability more than target since the loop passes it twice. similarly, matrix(1, 1) has probability more than (1, 0) because the loop sees it four times at (0, 0), (1, 0), (0, 1), (1, 1).
This also happens in the middle of the matrix due to prior loop operations.
So the main variable affecting the result is the block_shape.
I've been fiddling around for a bit and here's an alternative way using while loop instead of for loop. Simulating through until you reach target probability within err. You just need to watch out for inf loop due to too small err.
import numpy as np
def sparsify(matrix, target, block_shape=None):
if block_shape is None or block_shape == 1 or block_shape == (1,) or block_shape == (1, 1):
# 1x1 is just bernoulli with p=target
probs = np.random.uniform(size=matrix.shape)
mask = np.zeros(matrix.shape)
mask[probs >= target] = 1.0
else:
if isinstance(block_shape, int):
block_shape = (block_shape, block_shape)
if len(block_shape) == 1:
block_shape = (block_shape[0], block_shape[0])
mask = np.ones(matrix.shape)
rows, cols = matrix.shape
# vars for probability check
total = float(rows * cols)
zero_cnt= total - np.count_nonzero(matrix)
err = 0.005 # .5%
# simulate until we reach target probability range
while not target - err < (zero_cnt/ total) < target + err:
# pick a random point in the matrix
row = np.random.randint(rows)
col = np.random.randint(cols)
# submask = mask[row:row + block_shape[0], col:col + block_shape[1]]
submask = matrix[row:row + block_shape[0], col:col + block_shape[1]]
if submask.shape != block_shape:
# we don't care about the edges, cannot partially sparsify
continue
if (submask == 0).any():
# If current (row, col) is already in the sparsified area, skip
continue
# need more 0s to reach target probability range
if zero_cnt/ total < target - err:
matrix[row:row + block_shape[0], col:col + block_shape[1]] = 0
# need more 1s to reach target probability range
else:
matrix[row:row + block_shape[0], col:col + block_shape[1]] = 1
# update 0 count
zero_cnt = total - np.count_nonzero(matrix)
return matrix * mask, mask
note.
Didn't check for any optimization or code refactoring.
Didn't use the mask var. Worked on the matrix directly.
matrix = np.ones((100, 100))
matrix, mask = sparsify(matrix, target=0.5, block_shape=(2, 2))
print((matrix == 0).mean())
# prints somewhere between target - err and target + err
# likely to see a lower value in the range since we're counting up (0s)
So given a 2d numpy array consisting of ones and zeros, I want to find every index where it is a value of one and where either to its top, left, right, or bottom consists of a zero. For example in this array
0 0 0 0 0
0 0 1 0 0
0 1 1 1 0
0 0 1 0 0
0 0 0 0 0
I only want coordinates for (1,2), (2,1), (2,3) and (3,2) but not for (2,2).
I have created code that works and creates two lists of coordinates, similar to the numpy nonzero method, however I don't think its very "pythonic" and I was hoping there was a better and more efficient way to solve this problem. (*Note this only works on arrays padded by zeros)
from numpy import nonzero
...
array= ... # A numpy array consistent of zeros and ones
non_zeros_pairs=nonzero(array)
coordinate_pairs=[[],[]]
for x, y in zip(temp[0],temp[1]):
if array[x][y+1]==0 or array[x][y-1]==0 or array[x+1][y]==0 or array[x-1][y]==0:
coordinate_pairs[0].append(x)
coordinate_pairs[1].append(y)
...
If there exist methods in numpy that can handle this for me, that would be awesome. If this question has already been asked/answered on stackoverflow before, I will gladly remove this, I just struggled to find anything. Thank You.
Setup
import scipy.signal
import numpy as np
a = np.array([[0, 0, 0, 0, 0],
[0, 0, 1, 0, 0],
[0, 1, 1, 1, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 0]])
Create a window which matches the four directions from each value, and convolve. Then, you can check if elements are 1, and if their convolution is less than 4, since a value ==4 means that the value was surrounded by 1s
window = np.array([[0, 1, 0],
[1, 0, 1],
[0, 1, 0]])
m = scipy.signal.convolve2d(a, window, mode='same', fillvalue=1)
v = np.where(a & (m < 4))
list(zip(*v))
[(1, 2), (2, 1), (2, 3), (3, 2)]
Given a numpy array (let it be a bit array for simplicity), how can I construct a new array of the same shape where 1 stands exactly at the positions where in the original array there was a zero, preceded by at least N-1 consecutive zeros?
For example, what is the best way to implement function nzeros having two arguments, a numpy array and the minimal required number of consecutive zeros:
import numpy as np
a = np.array([0, 0, 0, 0, 1, 0, 0, 0, 1, 1])
b = nzeros(a, 3)
Function nzeros(a, 3) should return
array([0, 0, 1, 1, 0, 0, 0, 1, 0, 0])
Approach #1
We can use 1D convolution -
def nzeros(a, n):
# Define kernel for 1D convolution
k = np.ones(n,dtype=int)
# Get sliding summations for zero matches with that kernel
s = np.convolve(a==0,k)
# Look for summations that are equal to n value, which will occur for
# n consecutive 0s. Remember that we are using a "full" version of
# convolution, so there's one-off offsetting because of the way kernel
# slides across input data. Also, we need to create 1s at places where
# n consective 0s end, so we would need to slice out ending elements.
# Thus, we would end up with the following after int dtype conversion
return (s==n).astype(int)[:-n+1]
Sample run -
In [46]: a
Out[46]: array([0, 0, 0, 0, 1, 0, 0, 0, 1, 1])
In [47]: nzeros(a,3)
Out[47]: array([0, 0, 1, 1, 0, 0, 0, 1, 0, 0])
In [48]: nzeros(a,2)
Out[48]: array([0, 1, 1, 1, 0, 0, 1, 1, 0, 0])
Approach #2
Another way to solve and this could be considered as a variant of the 1D convolution approach, would be to use erosion, because if you look at the outputs, we can simply erode the mask of 0s from the starts until n-1 places. So, we can use scipy.ndimage.morphology's binary_erosion that also allow us to specify the portion of kernel center with its origin arg, hence we will avoid any slicing. The implementation would look something like this -
from scipy.ndimage.morphology import binary_erosion
out = binary_erosion(a==0,np.ones(n),origin=(n-1)//2).astype(int)
Using for loop:
def nzeros(a, n):
#Create a numpy array of zeros of length equal to n
b = np.zeros(n)
#Create a numpy array of zeros of same length as array a
c = np.zeros(len(a), dtype=int)
for i in range(0,len(a) - n):
if (b == a[i : i+n]).all(): #Check if array b is equal to slice in a
c[i+n-1] = 1
return c
Sample Output:
print(nzeros(a, 3))
[0 0 1 1 0 0 0 1 0 0]