Changing colorspaces with numpy.tensordot - python

I have an image I've read from file with shape (m,n,3) (i.e. it has 3 channels). I also have a matrix to convert the color space with dimensions (3,3). I've already arrived at a few different ways of applying this matrix to each vector in the image; for example,
np.einsum('ij,...j',transform,image)
appears to make for the same results as the following (far slower) implementation.
def convert(im: np.array, transform: np.array) -> np.array:
""" Convert an image array to another colorspace """
dimensions = len(im.shape)
axes = im.shape[:dimensions-1]
# Create a new array (respecting mutability)
new_ = np.empty(im.shape)
for coordinate in np.ndindex(axes):
pixel = im[coordinate]
pixel_prime = transform # pixel
new_[coordinate] = pixel_prime
return new_
However, I found that the following is even more efficient while testing on the example image with line_profiler.
np.moveaxis(np.tensordot(transform, X, axes=((-1),(-1))), 0, 2)
The problem I'm having here is using just a np.tensordot, i.e. removing the need for np.moveaxis. I've spent a few hours attempting to find a solution (I'm guessing it resides in choosing the correct axes), so I thought I'd ask others for help.

You can do it concisely with tensordot if you make image the first argument:
np.tensordot(image, transform, axes=(-1, 1))
You can get better performance from einsum by using the argument optimize=True (requires numpy 1.12 or later):
np.einsum('ij,...j', transform, image, optimize=True)
Or (as Paul Panzer pointed out in a comment), you can simply use matrix multiplication:
image # transform.T
They all take about the same time on my computer.

Related

CV2.approxPolyDP Function Returns an N x 1 x 2 Array Rather Than N x 2

I have a function that draws a polygon around some primary features in images by making use of OpenCV's contour detection. I simplify these contours using the approxPolyDP function in the snippet below to return a closed trapezoid around the regions, and it works fine:
top_poly = cv.approxPolyDP(top_cnt, 0.05 * top_perimeter, closed=True)
top_poly = np.squeeze(top_poly) # get rid of the singleton dimension
However, the approxPolyDP returns a strange type of ndarray which has a shape of N x 1 x 2, when the expected output according to the documentation linked below is an array of N x 2D points (N x 2). I had to debug for a while until I found that singleton dimension in there can be carved out using np.squeeze in the second line of the snippet. Thanks to this answer: What does cv2.approxPolydp() return?
My question is, what is the purpose of this singleton dimension? I worry I might be dropping some useful information and I don't enjoy having to use np.squeeze() in a way I don't completely understand. Thanks for any input that can shed some light on this.
https://docs.opencv.org/4.5.4/d3/dc0/group__imgproc__shape.html#ga0012a5fdaea70b8a9970165d98722b4c
The result shape happens because of OpenCV's requirements, i.e. it has to map numpy arrays to cv::Mat and back.
A cv::Mat is a 2D thing with channels, which can be color (RGB or whatever), 2D points, 3D points, 4D, ..., or any other purpose.
The shape is generally (height, width, channels).
OpenCV returns the points as a column vector (Nx1) of 2-channel data, hence (N, 1, 2).
OpenCV is somewhat tolerant of different shapes like (N, 2, 1) which is (N, 2) (N rows, 2 columns, single-channel).

What is the fastest way to read in an image to an array of tuples?

I am trying to assign provinces to an area for use in a game mod. I have two separate maps for area and provinces.
provinces file,
area file.
Currently I am reading in an image in Python and storing it in an array using PIL like this:
import PIL
land_prov_pic = Image.open(INPUT_FILES_DIR + land_prov_str)
land_prov_array = np.array(land_prov_pic)
image_size = land_prov_pic.size
for x in range(image_size[0]):
if x % 100 == 0:
print(x)
for y in range(image_size[1]):
land_prov_array[x][y] = land_prov_pic.getpixel((x,y))
Where you end up with land_prov_array[x][y] = (R,G,B)
However, this get's really slow, especially for large images. I tried reading it in using opencv like this:
import opencv
land_prov_array = cv2.imread(INPUT_FILES_DIR + land_prov_str)
land_prov_array = cv2.cvtColor(land_prov_array, cv2.COLOR_BGR2RGB) #Convert from BGR to RGB
But now land_prov_array[x][y] = [R G B] which is an ndarray and can't be inserted into a set. But it's way faster than the previous for loop. How do I convert [R G B] to (R,G,B) for every element in the array without for loops or, better yet, read it in that way?
EDIT: Added pictures, more description, and code blocks for readability.
It is best to convert the [R,G,B] array to tuple when you need it to be a tuple, rather than converting the whole image to this form. An array of tuples takes up a lot more memory, and will be a lot slower to process, than a numeric array.
The answer by isCzech shows how to create a NumPy view over a 3D array that presents the data as if it were a 2D array of tuples. This might not require the additional memory of an actual array of tuples, but it is still a lot slower to process.
Most importantly, most NumPy functions (such as np.mean) and operators (such as +) cannot be applied to such an array. Thus, one is obliged to iterate over the array in Python code (or with a #np.vectorize function), which is a lot less efficient than using NumPy functions and operators that work on the array as a whole.
For transformation from a 3D array (data3D) to a 2D array (data2D), I've used this approach:
import numpy as np
dt = np.dtype([('x', 'u1'), ('y', 'u1'), ('z', 'u1')])
data2D = data3D.view(dtype=dt).squeeze()
The .view modifies the data type and returns still a 3D array with the last dimension of size 1 which can be then removed by .squeeze. Alternatively you can use .squeeze(axis=-1) to only squeeze the last dimension (in case some of your other dimensions are of size 1 too).
Please note I've used uint8 ('u1') - your type may be different.
Trying to do this using a loop is very slow, indeed (compared to this approach at least).
Similar question here: Show a 2d numpy array where contents are tuples as an image

Efficiently Using Multiple Numpy Slices for Random Image Cropping

I have a 4-D numpy array, with the first dimension representing the number of images in a data set, the second and third being the (equal) width and height, and the 4th being the number of channels (3). For example let's say I have 4 color images that are 28*28, so my image data looks like this:
X = np.reshape(np.arange(4*28*28*3), (4,28,28,3))
I would like to select a random 16*16 width x height crop of each of the 4 images. Critically, I want the crop to be different per-image, i.e I want to generate 4 random (x_offset, y_offset) pairs. In the end I want access to an array of shape (4, 16, 16, 3).
If I were to write this in a for loop it would look something like this:
x = np.random.randint(0,12,4)
y = np.random.randint(0,12,4)
for i in range(X.shape[0]):
cropped_image = X[i, x[i]:x[i]+16, y[i]:y[i]+16, :]
#Add cropped image to a list or something
But I'd like to do it as efficiently as possible and I'm wondering if there's a way to do it with strides and fancy indexing. I've seen the answers to this question, but can't quite wrap my head around how I might combine something like stride_tricks with random starting points for the strides on the second and third (width and height) axes.
Leverage strided-based method for efficient patch extraction
We can leverage np.lib.stride_tricks.as_strided based scikit-image's view_as_windows to get sliding windows that would be merely views into the input array and hence incur no extra memory overhead and virtually free! We can surely use np.lib.stride_tricks.as_strided directly, but the setup work required is hard to manage especially on arrays with higher dimensions. If scikit-image is not available, we can directly use the source code that works standalone.
Explanation on usage of view_as_windows
The idea with view_as_windows is that we feed in the input arg window_shape as a tuple of length same as the number of dimensions in the input array whose sliding windows are needed. The axes along which we need to slide are fed with the respective window lengths and rest are fed with 1s. This would create an array of views with singleton dims/axes i.e. axes with lengths=1 corresponding to the 1s in window_shape arg. So, for those cases we might want to index into the zeroth element corresponding to the axes that are fed 1 as the sliding window lengths to have a squeezed version of the sliding windows.
Thus, we would have a solution, like so -
# Get sliding windows
from skimage.util.shape import view_as_windows
w = view_as_windows(X, (1,16,16,1))[...,0,:,:,0]
# Index and get our specific windows
out = w[np.arange(X.shape[0]),x,y]
# If you need those in the same format as in the posted loopy code
out = out.transpose(0,2,3,1)

Efficient convolution of two images along one axis

I have two large grayscale images. Either PIL.Image or numpy data structure.
How do I do 1d convolution of the two images along one axis?
The best I come up with is
def conv2(im1, im2, *args):
res = 0
for l1, l2 in zip(im1, im2):
res += np.convolve(l1, l2, *args)
return res
Which works, but not extremely fast. Is there a faster way?
Please note that all the 2D convolution functions are probably not relevant since I am not interested in a 2D convolution. I've seen this question on SO before, but I didn't see a better answer than my code. So I'm bumping it again.
FFT along one axis, multiply along one axis and inverse FFT.
Should be MUCH faster according to this explanation
Scipy.signal.fftconvolve should do the job.

Numpy flipped coordinate system

I used opencv to do some image processing. I tried to then plot the image on my plotter (origin at lower left), however the image is flipped. opencv's origin is in the upper left, and so the y coordinates of the image are flipped.
What function should I apply to my points such that it will plot properly in the new origin system (lower left)?
EDIT:
I am not concerned with changing the plot display, I actually need the points' coordinates flipped.
Using np.flipud did not change the points at all since the point are displayed by a N X 2 matrix.
The problem does not lie in numpy but in matplotlib way of displaying data. In order to produce valid visualization you should flip y-axis on the image generation level, not numpy analysis. It can be easily done through matplitlib API to the axes object:
plt.gca().invert_yaxis()
Are you asking how to flip a numpy array or how to display it?
If you're asking how to flip the array, have a look at np.flipud or equivalently your_data[::-1, ...]
numpy.flipud is a function that uses the exact slicing shown above. However, it's more readable if you're not familiar with numpy slicing.
To explain why data[::-1, ...] will flip the array vertically, you need to understand a bit about how indexing works.
In general, indexing in python works by specifying start:stop:step. Each of these may be None (e.g. :-10 specifies start=None, stop=-10, step=None).
Therefore, ::-1 specifies start=None, stop=None, step=-1 -- in other words, go over the full sequence, but increment with a negative step, effectively reversing the sequence.
... is an Ellipsis. In numpy, this is used to indicate including all other dimensions.
The ellipsis avoids the need to special case your array being 2D or 3D (or 27-dimensional, for that matter). If it's a 2D array, then x[::-1, ...] is equivalent to x[::-1, :]. If it's a 3D array, it's equivalent to x[::-1, :, :], etc.
In numpy, the first axis is rows. Therefore, x[::-1, ...] says "reverse the rows and leave all other dimensions alone. This will create a view, so the memory wont' be duplicated and no copy will be created.
In the specific case of rows, you could leave the ellipsis out. However, it's useful to think about for the general case. For example, flipping left-right would be x[:, ::-1, ...] (or np.fliplr).

Categories