Efficiently Using Multiple Numpy Slices for Random Image Cropping - python

I have a 4-D numpy array, with the first dimension representing the number of images in a data set, the second and third being the (equal) width and height, and the 4th being the number of channels (3). For example let's say I have 4 color images that are 28*28, so my image data looks like this:
X = np.reshape(np.arange(4*28*28*3), (4,28,28,3))
I would like to select a random 16*16 width x height crop of each of the 4 images. Critically, I want the crop to be different per-image, i.e I want to generate 4 random (x_offset, y_offset) pairs. In the end I want access to an array of shape (4, 16, 16, 3).
If I were to write this in a for loop it would look something like this:
x = np.random.randint(0,12,4)
y = np.random.randint(0,12,4)
for i in range(X.shape[0]):
cropped_image = X[i, x[i]:x[i]+16, y[i]:y[i]+16, :]
#Add cropped image to a list or something
But I'd like to do it as efficiently as possible and I'm wondering if there's a way to do it with strides and fancy indexing. I've seen the answers to this question, but can't quite wrap my head around how I might combine something like stride_tricks with random starting points for the strides on the second and third (width and height) axes.

Leverage strided-based method for efficient patch extraction
We can leverage np.lib.stride_tricks.as_strided based scikit-image's view_as_windows to get sliding windows that would be merely views into the input array and hence incur no extra memory overhead and virtually free! We can surely use np.lib.stride_tricks.as_strided directly, but the setup work required is hard to manage especially on arrays with higher dimensions. If scikit-image is not available, we can directly use the source code that works standalone.
Explanation on usage of view_as_windows
The idea with view_as_windows is that we feed in the input arg window_shape as a tuple of length same as the number of dimensions in the input array whose sliding windows are needed. The axes along which we need to slide are fed with the respective window lengths and rest are fed with 1s. This would create an array of views with singleton dims/axes i.e. axes with lengths=1 corresponding to the 1s in window_shape arg. So, for those cases we might want to index into the zeroth element corresponding to the axes that are fed 1 as the sliding window lengths to have a squeezed version of the sliding windows.
Thus, we would have a solution, like so -
# Get sliding windows
from skimage.util.shape import view_as_windows
w = view_as_windows(X, (1,16,16,1))[...,0,:,:,0]
# Index and get our specific windows
out = w[np.arange(X.shape[0]),x,y]
# If you need those in the same format as in the posted loopy code
out = out.transpose(0,2,3,1)

Related

Combining n dimensional arrays

I am in the process of converting some matlab code to python. I working with a 3d volume h x w x d represented as an numpy array, I am extracting smaller 3d patches from this volume using the function from SO here. So if I have 32x32x32 array and extract 16x16x16 patches I end up with a shape (2, 2, 2, 16, 16, 16) After processing each patch I would like to put it back into shape h x w x d basically reverse window_nd What would be the idiomatic numpy way without looping each dimension? Since I also need to work with 2d and 4d data I would like to avoid creating a function for each dimension.
Normally, writing back to as_strided views is not advised because it can cause race conditions, but since you only made blocks, this should work:
original_shaped_array = windowed_array.transpose(0,3,1,4,2,5).reshape(32,32,32)
Additionally, if you never copied the windowed array, and do calculations in-place, the data should be changed in the original array - a windowed view is simply a new view into the same data. Don't do this if there is any overlap

What is the fastest way to read in an image to an array of tuples?

I am trying to assign provinces to an area for use in a game mod. I have two separate maps for area and provinces.
provinces file,
area file.
Currently I am reading in an image in Python and storing it in an array using PIL like this:
import PIL
land_prov_pic = Image.open(INPUT_FILES_DIR + land_prov_str)
land_prov_array = np.array(land_prov_pic)
image_size = land_prov_pic.size
for x in range(image_size[0]):
if x % 100 == 0:
print(x)
for y in range(image_size[1]):
land_prov_array[x][y] = land_prov_pic.getpixel((x,y))
Where you end up with land_prov_array[x][y] = (R,G,B)
However, this get's really slow, especially for large images. I tried reading it in using opencv like this:
import opencv
land_prov_array = cv2.imread(INPUT_FILES_DIR + land_prov_str)
land_prov_array = cv2.cvtColor(land_prov_array, cv2.COLOR_BGR2RGB) #Convert from BGR to RGB
But now land_prov_array[x][y] = [R G B] which is an ndarray and can't be inserted into a set. But it's way faster than the previous for loop. How do I convert [R G B] to (R,G,B) for every element in the array without for loops or, better yet, read it in that way?
EDIT: Added pictures, more description, and code blocks for readability.
It is best to convert the [R,G,B] array to tuple when you need it to be a tuple, rather than converting the whole image to this form. An array of tuples takes up a lot more memory, and will be a lot slower to process, than a numeric array.
The answer by isCzech shows how to create a NumPy view over a 3D array that presents the data as if it were a 2D array of tuples. This might not require the additional memory of an actual array of tuples, but it is still a lot slower to process.
Most importantly, most NumPy functions (such as np.mean) and operators (such as +) cannot be applied to such an array. Thus, one is obliged to iterate over the array in Python code (or with a #np.vectorize function), which is a lot less efficient than using NumPy functions and operators that work on the array as a whole.
For transformation from a 3D array (data3D) to a 2D array (data2D), I've used this approach:
import numpy as np
dt = np.dtype([('x', 'u1'), ('y', 'u1'), ('z', 'u1')])
data2D = data3D.view(dtype=dt).squeeze()
The .view modifies the data type and returns still a 3D array with the last dimension of size 1 which can be then removed by .squeeze. Alternatively you can use .squeeze(axis=-1) to only squeeze the last dimension (in case some of your other dimensions are of size 1 too).
Please note I've used uint8 ('u1') - your type may be different.
Trying to do this using a loop is very slow, indeed (compared to this approach at least).
Similar question here: Show a 2d numpy array where contents are tuples as an image

Efficiently filter 3D matrix in numpy with variable 2D masks

I have a 3D numpy array points of dimensions [10000x3000x128] where the first dimension is the number of frames, the second dimension the number of points in each frame and the third dimension is a 128-element feature vector associated to each point. What I want to do is to efficiently filter the points in each frame by using a boolean 2D mask of dimensions [10000x3000] and for each of the selected points also take the related 128-dim vector of features. Moreover, in output I need still a 3D vector and not a merged 2D vector and possibly avoid any for loop.
Actually what I'm doing is:
# example of points
points = np.array([10000, 3000, 128])
# fg, bg = 2D dimensional boolean np.array
# init empty lists
fg_points, bg_points = [], []
for i in range(points.shape[0]):
fg_mask_tmp, bg_mask_tmp = fg[i], bg[i]
fg_points.append(points[i,fg_mask_tmp,:])
bg_points.append(points[i,bg_mask_tmp,:])
fg_features, bg_features = np.array(fg_points), np.array(bg_points)
But this is a quite naive solution that for sure can be improved in a more numpy-like way.
In addition, I also tried other solutions as:
fg_features = points[fg,:]
But this solution does not preserve the dimensions of the array merging the two first dimensions since the number of filtered points for each frame can vary.
Another solution I tried is to enlarge the 2D masks by appending a [128] true value to the last dimension, but with any successful result.
Dos anyone know a possible efficient solution?
Thank you in advance for any help!

Changing colorspaces with numpy.tensordot

I have an image I've read from file with shape (m,n,3) (i.e. it has 3 channels). I also have a matrix to convert the color space with dimensions (3,3). I've already arrived at a few different ways of applying this matrix to each vector in the image; for example,
np.einsum('ij,...j',transform,image)
appears to make for the same results as the following (far slower) implementation.
def convert(im: np.array, transform: np.array) -> np.array:
""" Convert an image array to another colorspace """
dimensions = len(im.shape)
axes = im.shape[:dimensions-1]
# Create a new array (respecting mutability)
new_ = np.empty(im.shape)
for coordinate in np.ndindex(axes):
pixel = im[coordinate]
pixel_prime = transform # pixel
new_[coordinate] = pixel_prime
return new_
However, I found that the following is even more efficient while testing on the example image with line_profiler.
np.moveaxis(np.tensordot(transform, X, axes=((-1),(-1))), 0, 2)
The problem I'm having here is using just a np.tensordot, i.e. removing the need for np.moveaxis. I've spent a few hours attempting to find a solution (I'm guessing it resides in choosing the correct axes), so I thought I'd ask others for help.
You can do it concisely with tensordot if you make image the first argument:
np.tensordot(image, transform, axes=(-1, 1))
You can get better performance from einsum by using the argument optimize=True (requires numpy 1.12 or later):
np.einsum('ij,...j', transform, image, optimize=True)
Or (as Paul Panzer pointed out in a comment), you can simply use matrix multiplication:
image # transform.T
They all take about the same time on my computer.

Numpy flipped coordinate system

I used opencv to do some image processing. I tried to then plot the image on my plotter (origin at lower left), however the image is flipped. opencv's origin is in the upper left, and so the y coordinates of the image are flipped.
What function should I apply to my points such that it will plot properly in the new origin system (lower left)?
EDIT:
I am not concerned with changing the plot display, I actually need the points' coordinates flipped.
Using np.flipud did not change the points at all since the point are displayed by a N X 2 matrix.
The problem does not lie in numpy but in matplotlib way of displaying data. In order to produce valid visualization you should flip y-axis on the image generation level, not numpy analysis. It can be easily done through matplitlib API to the axes object:
plt.gca().invert_yaxis()
Are you asking how to flip a numpy array or how to display it?
If you're asking how to flip the array, have a look at np.flipud or equivalently your_data[::-1, ...]
numpy.flipud is a function that uses the exact slicing shown above. However, it's more readable if you're not familiar with numpy slicing.
To explain why data[::-1, ...] will flip the array vertically, you need to understand a bit about how indexing works.
In general, indexing in python works by specifying start:stop:step. Each of these may be None (e.g. :-10 specifies start=None, stop=-10, step=None).
Therefore, ::-1 specifies start=None, stop=None, step=-1 -- in other words, go over the full sequence, but increment with a negative step, effectively reversing the sequence.
... is an Ellipsis. In numpy, this is used to indicate including all other dimensions.
The ellipsis avoids the need to special case your array being 2D or 3D (or 27-dimensional, for that matter). If it's a 2D array, then x[::-1, ...] is equivalent to x[::-1, :]. If it's a 3D array, it's equivalent to x[::-1, :, :], etc.
In numpy, the first axis is rows. Therefore, x[::-1, ...] says "reverse the rows and leave all other dimensions alone. This will create a view, so the memory wont' be duplicated and no copy will be created.
In the specific case of rows, you could leave the ellipsis out. However, it's useful to think about for the general case. For example, flipping left-right would be x[:, ::-1, ...] (or np.fliplr).

Categories