I need to slice sections out of a NumPy array in a specific way. Say I have a (200,200, 4) shape NumPy array. Then for every index in (200, 200), I want to select the 5x5x4 surrounding indexes, flatten it, and then put it into another array. So finally, the shape of the final array would be (200, 200, 100). Additionally, I want to delete all values at the location (:, :, 12). So finally, we'd get shape (200, 200, 99).
I've thought of two ways to go about this but they give different results and I'm not sure what I'm doing wrong.
Method 1:
import numpy as np
arr_lst = [np.random.normal(size=(200, 200)) for _ in range(4)]
slice_arr = np.zeros([200, 200, 99])
start = 0
for i, arr in enumerate(arr_lst):
for idx, _ in np.ndenumerate(arr):
#Getting surrounding 25 pixels
pos_arr = arr[idx[0]-2:idx[0]+3, idx[1]-2:idx[1]+3]
#Reshaping, into size 100
pos_arr = pos_arr.reshape(-1)
#Near the boundaries slicing does not result in size 25
if pos_arr.shape[0] != 25:
pos_arr = np.full(25, np.nan)
if i == 0:
pos_arr = np.delete(pos_arr, 12)
end = start + 25 - 1
else:
end = start + 25
slice_arr[idx[0], idx[1], start:end] = pos_arr
start = end
print(slice_arr[10, 100])
Method 2:
import numpy as np
arr_lst = [np.random.normal(size=(200, 200)) for _ in range(4)]
stacked_arr = np.stack(arr_lst, axis=2)
slice_arr = np.zeros([200, 200, 100])
for i in range(200):
for j in range(200):
x = stacked_arr[i-2:i+3, j-2:j+3, 0:4]
if x.shape != (5, 5, 4):
x = np.array([np.nan for _ in range(100)])
else:
x = x.reshape(100)
slice_arr[i,j] = x
slice_arr = np.delete(slice_arr, 12, 2)
print(slice_arr[10, 100])
The first method gives me the array that I want in the correct order, but the second method feels more natural and faster. Another question I have is if I can optimize this at all? Is there a fast way for slicing around every index at the same time and keeping each slice the same shape? Then afterwards, deleting what things we want to?
Using #hpaulj helpful comments I designed a solution that I think works for my purposes. It's similar to what was suggested here: Rolling windows for ndarrays but has the additional border of np.nan values. If anyone else finds this useful I've posted it here, for debugging purposes, I've set the values in the padded array to coordinate tuples:
from skimage.util.shape import view_as_windows
arr_lst = [np.empty(shape=(200, 200), dtype=tuple) for _ in range(4)]
arr_lst = [np.pad(x, pad_width=2, mode='constant', constant_values=np.nan) for x in arr_lst]
padded_arr = np.stack(arr_lst, axis=2)
for idx, _ in np.ndenumerate(padded_arr):
padded_arr[idx[0], idx[1], idx[2]] = idx
w = view_as_windows(padded_arr, (5, 5, 4)).reshape(200, 200, 100)
Related
I have a numpy array of 300x300 where I want to keep all elements periodically. Specifically, for both axes I want to keep the first 5 elements, then discard 15, keep 5, discard 15, etc. This should result in an array of 75x75 elements. How can this be done?
You can created a 1D mask, that carries out the keep/discard function, and then repeat the mask and apply the mask to the array. Here is an example.
import numpy as np
size = 300
array = np.arange(size).reshape((size, 1)) * np.arange(size).reshape((1, size))
mask = np.concatenate((np.ones(5), np.zeros(15))).astype(bool)
period = len(mask)
mask = np.repeat(mask.reshape((1, period)), repeats=size // period, axis=0)
mask = np.concatenate(mask, axis=0)
result = array[mask][:, mask]
print(result.shape)
You can view the array as series of 20x20 blocks, of which you want to keep the upper-left 5x5 portion. Let's say you have
keep = 5
discard = 15
This only works if
assert all(s % (keep + discard) == 0 for s in arr.shape)
First compute the shape of the view and use it:
block = keep + discard
shape1 = (arr.shape[0] // block, block, arr.shape[1] // block, block)
view = arr.reshape(shape1)[:, :keep, :, :keep]
The following operation will create a copy of the data because the view creates a non-contiguous buffer:
shape2 = (shape1[0] * keep, shape1[2] * keep)
result = view.reshape(shape2)
You can compute shape1 and shape2 in a more general manner with something like
shape1 = tuple(
np.stack((np.array(arr.shape) // block,
np.full(arr.ndim, block)), -1).ravel())
shape2 = tuple(np.array(shape1[::2]) * keep)
I would recommend packaging this into a function.
Here is my first thought of a solution. Will update later if I think of one with fewer lines. This should work even if the input is not square:
output = []
for i in range(len(arr)):
tmp = []
if i % (15+5) < 5: # keep first 5, then discard next 15
for j in range(len(arr[i])):
if j % (15+5) < 5: # keep first 5, then discard next 15
tmp.append(arr[i,j])
output.append(tmp)
Update:
Building off of Yang's answer, here is another way which uses np.tile, which repeats an array a given number of times along each axis. This relies on the input array being square in dimension.
import numpy as np
# Define one instance of the keep/discard box
keep, discard = 5, 15
mask = np.concatenate([np.ones(keep), np.zeros(discard)])
mask_2d = mask.reshape((keep+discard,1)) * mask.reshape((1,keep+discard))
# Tile it out -- overshoot, then trim to match size
count = len(arr)//len(mask_2d) + 1
tiled = np.tile(mask_2d, [count,count]).astype('bool')
tiled = tiled[:len(arr), :len(arr)]
# Apply the mask to the input array
dim = sum(tiled[0])
output = arr[tiled].reshape((dim,dim))
Another option using meshgrid and a modulo:
# MyArray = 300x300 numpy array
r = np.r_[0:300] # A slide from 0->300
xv, yv = np.meshgrid(r, r) # x and y grid
mask = ((xv%20)<5) & ((yv%20)<5) # We create the boolean mask
result = MyArray[mask].reshape((75,75)) # We apply the mask and reshape the final output
I have this code, and it works. It just seems like there may be a better way to do this. Does anyone know a cleaner solution?
def Matrix2toMatrix(Matrix2):
scaleSize = len(Matrix2[0, 0])
FinalMatrix = np.empty([len(Matrix2)*scaleSize, len(Matrix2[0])*scaleSize])
for x in range(0, len(Matrix2)):
for y in range(0, len(Matrix2[0])):
for xFinal in range(0, scaleSize):
for yFinal in range(0, scaleSize):
FinalMatrix[(x*scaleSize)+xFinal, (y*scaleSize)+yFinal] = Matrix2[x, y][xFinal, yFinal]
return FinalMatrix
This is where Matrix2 is a 4x4 matrix, with each cell containing a 2x2 matrix
Full code in case anyone was wondering:
import matplotlib.pyplot as plt
import numpy as np
def Matrix2toMatrix(Matrix2):
scaleSize = len(Matrix2[0, 0])
FinalMatrix = np.empty([len(Matrix2)*scaleSize, len(Matrix2[0])*scaleSize])
for x in range(0, len(Matrix2)):
for y in range(0, len(Matrix2[0])):
for xFinal in range(0, scaleSize):
for yFinal in range(0, scaleSize):
FinalMatrix[(x*scaleSize)+xFinal, (y*scaleSize)+yFinal] = Matrix2[x, y][xFinal, yFinal]
return FinalMatrix
XSize = 4
Xtest = np.array([[255, 255, 255, 255]
,[255, 255, 255, 255]
,[127, 127, 127, 127]
,[0, 0, 0, 0]
])
scaleFactor = 2
XMarixOfMatrix = np.empty([XSize, XSize], dtype=object)
Xexpanded = np.empty([XSize*scaleFactor, XSize*scaleFactor], dtype=int) # careful, will contain garbage data
for xOrg in range(0, XSize):
for yOrg in range(0, XSize):
newMatrix = np.empty([scaleFactor, scaleFactor], dtype=int) # careful, will contain garbage data
# grab org point equivalent
pointValue = Xtest[xOrg, yOrg]
newMatrix.fill(pointValue)
# now write the data
XMarixOfMatrix[xOrg, yOrg] = newMatrix
# need to concat all matrix together to form a larger singular matrix
Xexpanded = Matrix2toMatrix(XMarixOfMatrix)
img = plt.imshow(Xexpanded)
img.set_cmap('gray')
plt.axis('off')
plt.show()
Permute axes and reshape -
m,n = Matrix2.shape[0], Matrix2.shape[2]
out = Matrix2.swapaxes(1,2).reshape(m*n,-1)
For permuting axes, we could also use np.transpose or np.rollaxis, as functionally all are the same.
Verify with sample run -
In [17]: Matrix2 = np.random.rand(3,3,3,3)
# With given solution
In [18]: out1 = Matrix2toMatrix(Matrix2)
In [19]: m,n = Matrix2.shape[0], Matrix2.shape[2]
...: out2 = Matrix2.swapaxes(1,2).reshape(m*n,-1)
In [20]: np.allclose(out1, out2)
Out[20]: True
I know this is supposed to be simple but I can't figure it out.
The problem:
gt_prices = np.random.uniform(0, 100, size = (121147, 28))
pred_idxs = np.random.randint(0, 28 , size = (121147,))
print(gt_prices.shape, pred_idxs.shape)
(121147, 28) (121147,)
I want to get an array of shape (121147,), where for each row I have the element of ground_truth_prices in the position given by pred_idxs.
In other words, I want to do this:
selected_prices = np.array([gt_prices[i, pred_idxs[i]] for i in range(gt_prices.shape[0])])
But I'd like to do everything with NumPy. Is this possible?
You can do the following (used a smaller dimension of 3 for checking the correctness easier)
gt_prices = np.random.uniform(0, 100, size = (3, 28))
pred_idxs = np.random.randint(0, 28 , size = (3,))
indices = np.expand_dims(pred_idxs, axis=1)
gt_prices[np.arange(gt_prices.shape[0])[:,None], indices]
There is now an easy wrapper for this from numpy: https://numpy.org/devdocs/reference/generated/numpy.take_along_axis.html
For your usage, I believe it would be:
gt_prices = np.random.uniform(0, 100, size = (121147, 28))
pred_idxs = np.random.randint(0, 28 , size = (121147, 1)) # number of dimensions has to match
your_output = np.take_along_axis(gt_prices, pred_idxs, axis=1) # output shape [121147, 1]
I have two input arrays: data_arr of dimensions (i,j,k) and index_arr of dimensions(i,j). The entries in index_arr are integers in the range [0, k-1]. I would like to create an output array (output_arr) of dimensions (i,j) where for each element of output_arr, index_arr tells me which of the elements to choose from.
In other words output_arr[i,j] = data_arr[i,j, index_arr[i, j]]
Clearly I could do this at glacial pace with a double for loop. I would prefer something snappier using smart indexing. Currently the best I could devise involves creating two extra 2D matrices of size (i,j).
Below is a simple MWE framed in terms of creating a mosaiced image from an RGB image using a standard bayer pattern. I would like to be able to get rid X_ind and Y_ind
import numpy as np
import time
if __name__ == '__main__':
img_width = 1920
img_height = 1080
img_num_colours = 3
red_arr = np.ones([img_height, img_width], dtype=np.uint16) * 10
green_arr = np.ones([img_height, img_width], dtype=np.uint16) * 20
blue_arr = np.ones([img_height, img_width], dtype=np.uint16) * 30
img_arr = np.dstack((red_arr, green_arr, blue_arr))
bayer_arr = np.ones([img_height, img_width], dtype=np.uint16)
bayer_arr[0::2,0::2] = 0 # Red entries in bater patter
# Green entries are already set by np.ones intialisation
bayer_arr[1::2,1::2] = 2 # blue entries in bayer patter
print("bayer\n",bayer_arr[:8,:12], "\n")
mosaiced_arr = np.zeros([img_height, img_width], dtype=np.uint16)
Y_ind = np.repeat(np.arange(0, img_width).reshape(1, img_width), img_height, 0)
X_ind = np.repeat(np.arange(0, img_height).reshape(img_height, 1), img_width, 1)
start_time = time.time()
demos_arr = img_arr[X_ind, Y_ind, bayer_arr]
end_time = time.time()
print(demos_arr.shape)
print("demos\n",demos_arr[:8,:12], "\n")
print("Mosaic took {:.3f}s".format(end_time - start_time))
Edit:
As pointed out by #Georgy, this question is similar to this one which I didn't find with my search terms so maybe this post will act as a sign post for that one. The answers in the other post are applicable alhough the flattened index arithmetic is different since the ordering of my dimensions is different. The answer above is equivalent to the ogrid version in the other question. In fact ogrid can be used by replacing doing the following change to the code:
# Y_ind = np.repeat(np.arange(0, img_width).reshape(1, img_width), img_height, 0)
# X_ind = np.repeat(np.arange(0, img_height).reshape(img_height, 1), img_width, 1)
X_ind, Y_ind = np.ogrid[0:img_height, 0:img_width]
You can implement the choose option (limited to choosing between 32 options) like so:
start_time = time.time()
demos_arr = bayer_arr.choose((img_arr[...,0], img_arr[...,1], img_arr[...,2]))
end_time = time.time()
The ogrid solution runs in 12ms and the choose solution in 34ms on my machine
You want numpy.take_along_axis:
output_arr = numpy.take_along_axis(data_arr, index_arr[:, :, numpy.newaxis], axis=2)
output_arr = output_arr[:,:,0] # Since take_along_axis keeps the same number of dimensions
This function is new in numpy 1.15.0.
https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.take_along_axis.html
Note that data_arr and index_arr need to have the same number of dimensions. So you need to reshape the index_array to be 3 dimensions and reshape the result afterwards to be 2 dimensions again. I.e.:
start_time = time.time()
demos_arr = np.take_along_axis(img_arr, bayer_arr.reshape([img_height, img_width, 1]), axis=2).reshape([img_height, img_width])
end_time = time.time()
The timing results for take along axis are the same as the ogrid implementation.
I am creating inside a for loop in each iteration of it a numpy array of size 20x30x30x3. I want to concatenate all of those numpy arrays into a bigger one. If the iteration steps are 100 then the numpy array I want should be2000x30x30x3. I tried to do with lists:
new_one_arr1_list = []
new_one_arr2_list = []
all_arr1 = np.array([])
for item in one_arr1: # 100 iterations
item = np.reshape(item, (1, 30, 30, 3))
new_one_arr1 = np.repeat(item, 20, axis=0)
all_arr1 = np.concatenate(([all_arr1 , new_one_arr1 ]))
ind = np.random.randint(one_arr2.shape[0], size=(20,))
new_one_arr2= one_arr1[ind]
new_one_arr1_list.append(new_one_arr1)
new_one_arr2_list.append(new_one_arr2)
In each iteration step new_one_arr1 and new_one_arr2 they have size 20x30x30x3. In the end when I am converting new_one_arr1_list and new_one_arr2_list and the size it is 100x20x30x30x3. How can I have 2000x30x30x3 in the end in a numpy array?
EDIT: I tried to use concatenate to add the arrays within a numpy array all_arr1 using: all_arr1= np.concatenate(([all_arr1, new_one_arr1])) however, I received the message:
ValueError: all the input arrays must have same number of dimensions
In order to create the concatenation and work around the error, I initialized the array with None and tested if it is None in the loop.
Thereby you do not have to worry about not fitting dimensions.
However, i created some arrays for the ones you did only describe and ended up with a final dimesion of (400, 30, 30, 3).
This fits in here, since 20*20 = 400.
Hope this helps for you solution.
new_one_arr1_list = []
new_one_arr2_list = []
one_arr1 = np.ones((20,30,30,3))
one_arr2 = np.ones((20,30,30,3))
all_arr1 = None
count = 0
for item in one_arr1: # 100 iterations
item = np.reshape(item, (1, 30, 30, 3))
new_one_arr1 = np.repeat(item, 20, axis=0)
# print(all_arr1.shape, new_one_arr1.shape)
if all_arr1 is None:
all_arr1 = new_one_arr1
else:
all_arr1 = np.concatenate(([all_arr1 , new_one_arr1 ]), axis=0)
ind = np.random.randint(one_arr2.shape[0], size=(20,))
new_one_arr2= one_arr1[ind]
new_one_arr1_list.append(new_one_arr1)
new_one_arr2_list.append(new_one_arr2)
count += 1
print(count)
all_arr1.shape
Use np.concatenate operation given in the documenation:
https://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.concatenate.html
Don't concatenate in the first iteration, it'll raise dimension error, just copy it during the first iter. For the remaining iterations keep concatenating.
new_one_arr1_list = []
new_one_arr2_list = []
all_arr1 = np.array([])
firstIteration = True
for item in one_arr1: # 100 iterations
item = np.reshape(item, (1, 30, 30, 3))
new_one_arr1 = np.repeat(item, 20, axis=0)
if firstIteration:
all_arr1 = new_one_arr1
firstIteration=False
else:
all_arr1 = np.concatenate(([all_arr1 , new_one_arr1 ]))
ind = np.random.randint(one_arr2.shape[0], size=(20,))
new_one_arr2= one_arr1[ind]
new_one_arr1_list.append(new_one_arr1)
new_one_arr2_list.append(new_one_arr2)