Unable to retrieve required indices from multiple NumPy arrays - python
I have 4 numpy arrays of same shape(i.e., 2d). I have to know the index of the last array (d) where the elements of d are smaller than 20, but those indices of d should be located in the region where elements of array(a) are 1; and the elements of array (b) and (c) are not 1.
I tried as follows:
mask = (a == 1)|(b != 1)|(c != 1)
answer = d[mask | d < 20]
Now, I have to set those regions of d into 1; and all other regions of d into 0.
d[answer] = 1
d[d!=1] = 0
print d
I could not solve this problem. How do you solve it?
import numpy as np
a = np.array([[0,0,0,1,1,1,1,1,0,0,0],
[0,0,0,1,1,1,1,1,0,0,0],
[0,0,0,1,1,1,1,1,0,0,0],
[0,0,0,1,1,1,1,1,0,0,0],
[0,0,0,1,1,1,1,1,0,0,0],
[0,0,0,1,1,1,1,1,0,0,0]])
b = np.array([[0,0,0,1,1,0,0,0,0,0,0],
[0,0,0,0,0,0,1,1,0,0,0],
[0,0,0,1,0,1,0,0,0,0,0],
[0,0,0,1,1,1,0,1,0,0,0],
[0,0,0,0,0,0,1,0,0,0,0],
[0,0,0,0,1,0,1,0,0,0,0]])
c = np.array([[0,0,0,0,0,0,1,0,0,0,0],
[0,0,0,0,0,0,0,0,0,0,0],
[0,0,0,0,0,0,1,1,0,0,0],
[0,0,0,0,0,0,1,0,0,0,0],
[0,0,0,0,1,0,0,0,0,0,0],
[0,0,0,0,0,1,0,0,0,0,0]])
d = np.array([[0,56,89,67,12,28,11,12,14,8,240],
[1,57,89,67,18,25,11,12,14,9,230],
[4,51,89,87,19,20,51,92,54,7,210],
[6,46,89,67,51,35,11,12,14,6,200],
[8,36,89,97,43,67,81,42,14,1,220],
[9,16,89,67,49,97,11,12,14,2,255]])
The conditions should be AND-ed together, instead of OR-ed. You can first get the Boolean array / mask representing desired region, and then modify d based on it:
mask = (a == 1) & (b != 1) & (c != 1) & (d < 20)
d[mask] = 1
d[~mask] = 0
print d
Output:
[[0 0 0 0 0 0 0 1 0 0 0]
[0 0 0 0 1 0 0 0 0 0 0]
[0 0 0 0 1 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 1 0 0 0]]
Related
Write functions resilient to variable dimension array
I'm struggling when writing a function that would seemlessly apply to any numpy arrays whatever its dimension. At one point in my code, I have boolean arrays that I consider as mask for other arrays (0 = not passing, 1 = passing). I would like to "enlarge" those mask arrays by overriding zeros adjacent to ones on a defined range. Example : input = [0,0,0,0,0,1,0,0,0,0,1,0,0,0] enlarged_by_1 = [0,0,0,0,1,1,1,0,0,1,1,1,0,0] enlarged_by_2 = [0,0,0,1,1,1,1,1,1,1,1,1,1,0] input = [[0,0,0,1,0,0,1,0], [0,1,0,0,0,0,0,0], [0,0,0,0,0,0,1,0]] enlarged_by_1 = [[0,0,1,1,1,1,1,1], [1,1,1,0,0,0,0,0], [0,0,0,0,0,1,1,1]] This is pretty straighforward when inputs are 1D. However, I would like this function to take seemlessy 1D, matrix, 3D, and so on. So for a matrix, the same logic would be applied to each lines. I read about ellipsis, but it does not seem to be applicable in my case. Flattening the input applying the logic and reshaping the array would lead to possible contamination between individual arrays. I do not want to go through testing the shape of input numpy array / recursive function as it does not seems very clean to me. Would you have some suggestions ?
The operation that you are described seems very much like a convolution operation followed by clipping to ensure that values remain 0 or 1. For your example input: import numpy as np input = np.array([0,0,0,0,0,1,0,0,0,0,1,0,0,0], dtype=int) print(input) def enlarge_ones(x, k): mask = np.ones(2*k+1, dtype=int) return np.clip(np.convolve(x, mask, mode='same'), 0, 1).astype(int) print(enlarge_ones(input, k=1)) print(enlarge_ones(input, k=3)) which yields [0 0 0 0 0 1 0 0 0 0 1 0 0 0] [0 0 0 0 1 1 1 0 0 1 1 1 0 0] [0 0 1 1 1 1 1 1 1 1 1 1 1 1] numpy.convolve only works for 1-d arrays. However, one can imagine a for loop over the number of array dimensions and another for loop over each array. In other words, for a 2-d matrix first operate on every row and then on every column. You get the idea for nd-array with more dimensions. In other words the enlarge_ones would become something like: def enlarge_ones(x, k): n = len(x.shape) if n == 1: mask = np.ones(2*k+1, dtype=int) return np.clip(np.convolve(x, mask, mode='same')[:len(x)], 0, 1).astype(int) else: x = x.copy() for d in range(n): for i in np.ndindex(x.shape[:-1]): x[i] = enlarge_ones(x[i], k) # x[i] is 1-d x = x.transpose(list(range(1, n)) + [0]) return x Note the use of np.transpose to rotate the dimensions so that np.convolve is applied to the 1-d along each dimension. This is exactly n times, which returns the matrix to original shape at the end. x = np.zeros((3, 5, 7), dtype=int) x[1, 2, 2] = 1 print(x) print(enlarge_ones(x, k=1)) [[[0 0 0 0 0 0 0] [0 0 0 0 0 0 0] [0 0 0 0 0 0 0] [0 0 0 0 0 0 0] [0 0 0 0 0 0 0]] [[0 0 0 0 0 0 0] [0 0 0 0 0 0 0] [0 0 1 0 0 0 0] [0 0 0 0 0 0 0] [0 0 0 0 0 0 0]] [[0 0 0 0 0 0 0] [0 0 0 0 0 0 0] [0 0 0 0 0 0 0] [0 0 0 0 0 0 0] [0 0 0 0 0 0 0]]] [[[0 0 0 0 0 0 0] [0 1 1 1 0 0 0] [0 1 1 1 0 0 0] [0 1 1 1 0 0 0] [0 0 0 0 0 0 0]] [[0 0 0 0 0 0 0] [0 1 1 1 0 0 0] [0 1 1 1 0 0 0] [0 1 1 1 0 0 0] [0 0 0 0 0 0 0]] [[0 0 0 0 0 0 0] [0 1 1 1 0 0 0] [0 1 1 1 0 0 0] [0 1 1 1 0 0 0] [0 0 0 0 0 0 0]]]
How to create an NxM matrix with each column value in range(x,y)?
Problem: I want to create a 5 dimensional numpy matrix, each column's value restricted to a range. I can't find any solution online for this problem. I'm trying to generate a list of rules in the form Rule: (wordIndex, row, col, dh, dv) with each column having values in range ( (0-7), (0,11), (0,11), (-1,1), (-1,1) ). I want to generate all possible combinations. I could easily make the matrix using five loops, one inside another m, n = 12, 12 rules =[] for wordIndex in range(0, 15): for row in range(0,m): for col in range(0,n): for dh in range(-1,2): for dv in range(-1,2): rules.append([wordIndex, row, col, dh, dv]) But this approach takes an exponentially large time to do this and I wonder if there's a better, vectorized approach to solve this problem using numpy. I've tried the following but none seem to work: rules = np.mgrid[words[0]:words[-1], 0:11, 0:11, -1:1, -1:1] rules = np.rollaxis(words,0,4) rules = rules.reshape((len(words)*11*11*3*3, 5)) Another approach that fails: values = list(itertools.product(len(wordsGiven()), range(11), range(11), range(-1,1), range(-1,1))) I also tried np.arange() but can't seem to figure out how to use if for a multidimensional array.
I think there should be a better way for it. But just in case if you cannot find it, here is a hacky array based way for it: shape = (8-0, 12-0, 12-0, 2-(-1), 2-(-1)) a = np.zeros(shape) #create array of indices a = np.argwhere(a==0).reshape(*shape, len(shape)) #correct the ranges that does not start from 0, here 4th and 5th elements (dh and dv) reduced by -1 (starting range). #You can adjust this for any other ranges and elements easily. a[:,:,:,:,:,3:5] -= 1 First few elements of a: [[[[[[ 0 0 0 -1 -1] [ 0 0 0 -1 0] [ 0 0 0 -1 1]] [[ 0 0 0 0 -1] [ 0 0 0 0 0] [ 0 0 0 0 1]] [[ 0 0 0 1 -1] [ 0 0 0 1 0] [ 0 0 0 1 1]]] [[[ 0 0 1 -1 -1] [ 0 0 1 -1 0] [ 0 0 1 -1 1]] [[ 0 0 1 0 -1] [ 0 0 1 0 0] [ 0 0 1 0 1]] [[ 0 0 1 1 -1] [ 0 0 1 1 0] [ 0 0 1 1 1]]] [[[ 0 0 2 -1 -1] [ 0 0 2 -1 0] [ 0 0 2 -1 1]] [[ 0 0 2 0 -1] [ 0 0 2 0 0] [ 0 0 2 0 1]] [[ 0 0 2 1 -1] [ 0 0 2 1 0] [ 0 0 2 1 1]]] ...
Would like to vectorize while loop for performance
I am trying to set values for a window of an array based on the current value of another array. It should ignore values that the windown overrides. I need to be able to change the size of the window for different runs. This works but it is very slow. I thought there would be a vectorized solution somewhere. window_size=3 def signal(self): signal = pd.Series(data=0, index=arr.index) i = 0 while i < len(self.arr) - 1: s = self.arr.iloc[i] if s in [-1, 1]: j = i + window_size signal.iloc[i: j] = s i = i + window_size else: i += 1 return signal arr = [0 0 0 0 1 0 0 0 0 0 0 -1 -1 0 0 0 0 ] signal = [0 0 0 0 1 1 1 0 0 0 0 -1 -1 -1 0 0 0 ]
You could use shift function of pd.Series arr_series = pd.Series(arr) arr_series + arr_series.shift(periods=1, fill_value=0) + arr_series.shift(periods=2, fill_value=0)
Python numpy zeros array being assigned 1 for every value when only one index is updated
The following is my code: amount_features = X.shape[1] best_features = np.zeros((amount_features,), dtype=int) best_accuracy = 0 best_accuracy_index = 0 def find_best_features(best_features, best_accuracy): for i in range(amount_features): trial_features = best_features trial_features[i] = 1 svc = SVC(C = 10, gamma = .1) svc.fit(X_train[:,trial_features==1],y_train) y_pred = svc.predict(X_test[:,trial_features==1]) accuracy = metrics.accuracy_score(y_test,y_pred) if (accuracy > best_accuracy): best_accuracy = accuracy best_accuracy_index = i print(best_accuracy_index) best_features[best_accuracy_index] = 1 return best_features, best_accuracy bf, ba = find_best_features(best_features, best_accuracy) print(bf, ba) And this is my output: 25 [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1] 0.865853658537 And my expected output: 25 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0] 0.865853658537 I am trying to update the zeros array with the index that gives the highest accuracy. As you see it should be index 25, and I follow that by assigning the 25 index for my array equal to 1. However, when I print the array it shows every index has been updated to 1. Not sure what is the mishap. Thanks for spending your limited time on Earth to help me.
Change trial_features = best_features to trial_features = numpy.copy(best_features). Reasoning behind the change is already given by #Michael Butscher.
bounding box of numpy array
Suppose you have a 2D numpy array with some random values and surrounding zeros. Example "tilted rectangle": import numpy as np from skimage import transform img1 = np.zeros((100,100)) img1[25:75,25:75] = 1. img2 = transform.rotate(img1, 45) Now I want to find the smallest bounding rectangle for all the nonzero data. For example: a = np.where(img2 != 0) bbox = img2[np.min(a[0]):np.max(a[0])+1, np.min(a[1]):np.max(a[1])+1] What would be the fastest way to achieve this result? I am sure there is a better way since the np.where function takes quite a time if I am e.g. using 1000x1000 data sets. Edit: Should also work in 3D...
You can roughly halve the execution time by using np.any to reduce the rows and columns that contain non-zero values to 1D vectors, rather than finding the indices of all non-zero values using np.where: def bbox1(img): a = np.where(img != 0) bbox = np.min(a[0]), np.max(a[0]), np.min(a[1]), np.max(a[1]) return bbox def bbox2(img): rows = np.any(img, axis=1) cols = np.any(img, axis=0) rmin, rmax = np.where(rows)[0][[0, -1]] cmin, cmax = np.where(cols)[0][[0, -1]] return rmin, rmax, cmin, cmax Some benchmarks: %timeit bbox1(img2) 10000 loops, best of 3: 63.5 µs per loop %timeit bbox2(img2) 10000 loops, best of 3: 37.1 µs per loop Extending this approach to the 3D case just involves performing the reduction along each pair of axes: def bbox2_3D(img): r = np.any(img, axis=(1, 2)) c = np.any(img, axis=(0, 2)) z = np.any(img, axis=(0, 1)) rmin, rmax = np.where(r)[0][[0, -1]] cmin, cmax = np.where(c)[0][[0, -1]] zmin, zmax = np.where(z)[0][[0, -1]] return rmin, rmax, cmin, cmax, zmin, zmax It's easy to generalize this to N dimensions by using itertools.combinations to iterate over each unique combination of axes to perform the reduction over: import itertools def bbox2_ND(img): N = img.ndim out = [] for ax in itertools.combinations(reversed(range(N)), N - 1): nonzero = np.any(img, axis=ax) out.extend(np.where(nonzero)[0][[0, -1]]) return tuple(out) If you know the coordinates of the corners of the original bounding box, the angle of rotation, and the centre of rotation, you could get the coordinates of the transformed bounding box corners directly by computing the corresponding affine transformation matrix and dotting it with the input coordinates: def bbox_rotate(bbox_in, angle, centre): rmin, rmax, cmin, cmax = bbox_in # bounding box corners in homogeneous coordinates xyz_in = np.array(([[cmin, cmin, cmax, cmax], [rmin, rmax, rmin, rmax], [ 1, 1, 1, 1]])) # translate centre to origin cr, cc = centre cent2ori = np.eye(3) cent2ori[:2, 2] = -cr, -cc # rotate about the origin theta = np.deg2rad(angle) rmat = np.eye(3) rmat[:2, :2] = np.array([[ np.cos(theta),-np.sin(theta)], [ np.sin(theta), np.cos(theta)]]) # translate from origin back to centre ori2cent = np.eye(3) ori2cent[:2, 2] = cr, cc # combine transformations (rightmost matrix is applied first) xyz_out = ori2cent.dot(rmat).dot(cent2ori).dot(xyz_in) r, c = xyz_out[:2] rmin = int(r.min()) rmax = int(r.max()) cmin = int(c.min()) cmax = int(c.max()) return rmin, rmax, cmin, cmax This works out to be very slightly faster than using np.any for your small example array: %timeit bbox_rotate([25, 75, 25, 75], 45, (50, 50)) 10000 loops, best of 3: 33 µs per loop However, since the speed of this method is independent of the size of the input array, it can be quite a lot faster for larger arrays. Extending the transformation approach to 3D is slightly more complicated, in that the rotation now has three different components (one about the x-axis, one about the y-axis and one about the z-axis), but the basic method is the same: def bbox_rotate_3d(bbox_in, angle_x, angle_y, angle_z, centre): rmin, rmax, cmin, cmax, zmin, zmax = bbox_in # bounding box corners in homogeneous coordinates xyzu_in = np.array(([[cmin, cmin, cmin, cmin, cmax, cmax, cmax, cmax], [rmin, rmin, rmax, rmax, rmin, rmin, rmax, rmax], [zmin, zmax, zmin, zmax, zmin, zmax, zmin, zmax], [ 1, 1, 1, 1, 1, 1, 1, 1]])) # translate centre to origin cr, cc, cz = centre cent2ori = np.eye(4) cent2ori[:3, 3] = -cr, -cc -cz # rotation about the x-axis theta = np.deg2rad(angle_x) rmat_x = np.eye(4) rmat_x[1:3, 1:3] = np.array([[ np.cos(theta),-np.sin(theta)], [ np.sin(theta), np.cos(theta)]]) # rotation about the y-axis theta = np.deg2rad(angle_y) rmat_y = np.eye(4) rmat_y[[0, 0, 2, 2], [0, 2, 0, 2]] = ( np.cos(theta), np.sin(theta), -np.sin(theta), np.cos(theta)) # rotation about the z-axis theta = np.deg2rad(angle_z) rmat_z = np.eye(4) rmat_z[:2, :2] = np.array([[ np.cos(theta),-np.sin(theta)], [ np.sin(theta), np.cos(theta)]]) # translate from origin back to centre ori2cent = np.eye(4) ori2cent[:3, 3] = cr, cc, cz # combine transformations (rightmost matrix is applied first) tform = ori2cent.dot(rmat_z).dot(rmat_y).dot(rmat_x).dot(cent2ori) xyzu_out = tform.dot(xyzu_in) r, c, z = xyzu_out[:3] rmin = int(r.min()) rmax = int(r.max()) cmin = int(c.min()) cmax = int(c.max()) zmin = int(z.min()) zmax = int(z.max()) return rmin, rmax, cmin, cmax, zmin, zmax I've essentially just modified the function above using the rotation matrix expressions from here - I haven't had time to write a test-case yet, so use with caution.
Here is an algorithm to calculate the bounding box for N dimensional arrays, def get_bounding_box(x): """ Calculates the bounding box of a ndarray""" mask = x == 0 bbox = [] all_axis = np.arange(x.ndim) for kdim in all_axis: nk_dim = np.delete(all_axis, kdim) mask_i = mask.all(axis=tuple(nk_dim)) dmask_i = np.diff(mask_i) idx_i = np.nonzero(dmask_i)[0] if len(idx_i) != 2: raise ValueError('Algorithm failed, {} does not have 2 elements!'.format(idx_i)) bbox.append(slice(idx_i[0]+1, idx_i[1]+1)) return bbox which can be used with 2D, 3D, etc arrays as follows, In [1]: print((img2!=0).astype(int)) ...: bbox = get_bounding_box(img2) ...: print((img2[bbox]!=0).astype(int)) ...: [[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0] [0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0] [0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0] [0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0] [0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0] [0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0] [0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0] [0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0] [0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0] [0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]] [[0 0 0 0 0 0 1 1 0 0 0 0 0 0] [0 0 0 0 0 1 1 1 1 0 0 0 0 0] [0 0 0 0 1 1 1 1 1 1 0 0 0 0] [0 0 0 1 1 1 1 1 1 1 1 0 0 0] [0 0 1 1 1 1 1 1 1 1 1 1 0 0] [0 1 1 1 1 1 1 1 1 1 1 1 1 0] [1 1 1 1 1 1 1 1 1 1 1 1 1 1] [1 1 1 1 1 1 1 1 1 1 1 1 1 1] [0 1 1 1 1 1 1 1 1 1 1 1 1 0] [0 0 1 1 1 1 1 1 1 1 1 1 0 0] [0 0 0 1 1 1 1 1 1 1 1 0 0 0] [0 0 0 0 1 1 1 1 1 1 0 0 0 0] [0 0 0 0 0 1 1 1 1 0 0 0 0 0] [0 0 0 0 0 0 1 1 0 0 0 0 0 0]] Although replacing the np.diff and np.nonzero calls by one np.where might be better.
I was able to squeeze out a little more performance by replacing np.where with np.argmax and working on a boolean mask. def bbox(img): img = (img > 0) rows = np.any(img, axis=1) cols = np.any(img, axis=0) rmin, rmax = np.argmax(rows), img.shape[0] - 1 - np.argmax(np.flipud(rows)) cmin, cmax = np.argmax(cols), img.shape[1] - 1 - np.argmax(np.flipud(cols)) return rmin, rmax, cmin, cmax This was about 10µs faster for me than the bbox2 solution above on the same benchmark. There should also be a way to just use the result of argmax to find the non-zero rows and columns, avoiding the extra search done by using np.any, but this may require some tricky indexing that I wasn't able to get working efficiently with simple vectorized code.
I know this post is old and has already been answered, but I believe I've identified an optimized approach for large arrays and arrays loaded as np.memmaps. I was using ali_m's response that was optimized by Allen Zelener for smaller ndarrays, but this approach turns out to be quite slow for np.memmaps. Below is my implementation that has extremely similar performance speeds to ali_m's approach approach for arrays that fit in the working memory, but that far outperforms when bounding large arrays or np.memmaps. import numpy as np from numba import njit, prange #njit(parallel=True, nogil=True, cache=True) def bound(volume): """ Bounding function to bound large arrays and np.memmaps volume: A 3D np.array or np.memmap """ mins = np.array(volume.shape) maxes = np.zeros(3) for z in prange(volume.shape[0]): for y in range(volume.shape[1]): for x in range(volume.shape[2]): if volume[z,y,x]: if z < mins[0]: mins[0] = z elif z > maxes[0]: maxes[0] = z if y < mins[1]: mins[1] = y elif y > maxes[1]: maxes[1] = y if x < mins[2]: mins[2] = x elif x > maxes[2]: maxes[2] = x return mins, maxes My approach is somewhat inefficient in the sense that it just iterates over every point rather than flattening the arrays over specific dimensions. However, I found flattening np.memmaps using np.any() with a dimension argument to be quite slow. I tried using numba to speed up the flattening, but it doesn't support np.any() with arguments. As such, I came to my iterative approach that seems to perform quite well. On my computer (2019 16" MacBook Pro, 6-core i7, 16 GB 2667 MHz DDR4), I'm able to bound a np.memmap with a shape of (1915, 4948, 3227) in ~33 seconds, as opposed to the ali_m approach that takes around ~250 seconds. Not sure if anyone will ever see this, but hopefully it helps in the niche cases of needing to bound np.memmaps.