Python cv2.remap from mesh creates pixelated distortions - python

I have a mesh that is 4 times smaller than an image, I want to distort the image with the information from the mesh but, when using cv2.remap it makes the distortion pixelated (See image below) . How could I make a smoother distortion?
Original:
Desired output:
My output:
My code:
img = np.array(Image.open('astronaut.jpg')) # Shape -> (512, 512, 3)
mesh = Mesh('astronaut.msh').get_uvs() # Shape -> (128, 128, 2), 2 channels for x and y
new_mesh = np.zeros((img.shape[1], img.shape[0], 2))
new_mesh[:,:,0] = np.repeat(np.repeat(mesh[:,:,0], 4, axis=0), 4, axis=1)
new_mesh[:,:,1] = np.repeat(np.repeat(mesh[:,:,1], 4, axis=0), 4, axis=1)
nh, nw = img.shape[:2]
xs, ys = np.meshgrid(np.arange(0, nw), np.arange(0, nh))
xs = xs + new_mesh[:,:,0] * 4 # multiply by constant to modulate distort strength
ys = ys + new_mesh[:,:,1] * 4
xs = np.float32(xs)
ys = np.float32(ys)
dst= cv2.remap(img.astype(np.uint8), xs, ys, cv2.INTER_CUBIC)

OpenCV is not to blame. It does exactly what you tell it to.
Those artefacts come from your use of np.repeat. That merely repeats each index element in your map array. You're not upsampling your mesh properly, you effectively just copy 4x4 patches with that code.
Properly upsample your mesh (np.repeat is wrong), then you get good results. You can easily do that using cv.resize and anything other than nearest-neighbor interpolation. If you need to control boundary behavior exactly, you'll need warpAffine and a custom transformation matrix. You could even use cv.pyrUp (twice).
When you've presented a MRE (some data for the mesh), I'll update my answer with working code.

Related

Element-wise multiplication of a 3D array with a 2D array

I have a portion of a RGB image as numpy array, the shape of which is (height, width, channel) = (5, 5, 3).
What I want to do with this is to get the sum of element-wise multiplication with 5x5 kernel matrix, channel by channel. So it should yield a vector of size 3.
My current solution is:
print(portion.shape) # (5, 5, 3)
print(kernel.shape) # (5, 5)
result = [(kernel * portion[:, :, channel]).sum() for channel in range(3)]
print(result.shape) # (3,)
How can I achieve the same result in a more efficient way, hopefully without for-loop?
I'll show here two methods of doing this. The first one is basically the "manual" version that relies on broadcasting, which is an important concept to understand for using numpy and similar libraries.
The second method is basically using the Einstein summation convention, which can be quite fast if used right.
import numpy as np
portion = np.zeros((5, 5, 3))
kernel = np.zeros((5, 5))
# alternative
result = np.sum(kernel[..., None] * portion, axis=(0,1))
print(result.shape)
# einsum method:
result = np.einsum('ij,ijk->k', kernel, portion)
print(result.shape)
Try it online!

Best way to vectorize generating a batch of randomly rotated matrices in Numpy/PyTorch?

I’d like to generate batches of randomly rotated matrices based on an initial starting matrix (which has a shape of, for example, (4096, 3)), where the rotation applied to each matrix in the batch is randomly chosen from a group of rotation matrices (in my code in the original post, I only want to randomly select from 8 possible rotation angles). Therefore, what I end up with is a tensor of shape (batch_size, 4096, 3).
My current approach is that I pre-make the possible rotated matrices (since I’m only dealing with 8 possible random rotations), and then use a for loop to generate the batch by randomly picking one of the eight pre-made rotated matrices for each item in the batch. This isn’t super efficient, so I was hoping to vectorize the whole process somehow.
Right now, this is how I loop over a batch to generate a batch of rotated matrices one by one:
for view_i in range(batch_size):
# Get rotated view grid points randomly
idx = torch.randint(0, 8, (1,))
pointsf = rotated_points[idx]
In the code below, I generate a pre-made set of random rotation matrices that get randomly selected from in a for-loop over the batch.
The make_3d_grid function generates a (grid_dim * grid_dim * grid_dim, 3) shaped matrix (basically a 2D array of x, y, z coordinate points). The get_rotation_matrix function returns a (3, 3) rotation matrix, where theta is used for rotation around the x-axis.
rotated_points = []
grid_dim = 16
pointsf = make_3d_grid((-1,)*3, (1,)*3, (grid_dim,)*3)
view_angles = torch.tensor([0, np.pi / 4.0, np.pi / 2.0, 3 * np.pi / 4.0, np.pi, 5 * np.pi / 4.0, 3 * np.pi / 2.0, 7 * np.pi / 4.0])
for i in range(len(view_angles)):
theta = view_angles[i]
rot = get_rotation_matrix(theta, torch.tensor(0.0), torch.tensor(0.0))
pointsf_rot = torch.mm(pointsf, rot)
rotated_points.append(pointsf_rot)
Any help in vectorizing this would be greatly appreciated! If code for this can be done in Numpy that works fine too, since I can convert it to PyTorch myself.
You can pre-generate your rotation matrices as a (batch_size, 3, 3) array, and then multiply by your (N, 3) points array broadcasted to (batch_size, N, 3).
rotated_points = np.dot(pointsf, rots)
np.dot will sum-product over the last axis of pointsf and the second-to-last axis of rots, putting the dimensions of pointsf first. This means that your result will be of shape (N, batch_size, 3) rather than (batch_size, N, 3). You can of course fix this with a simple axis swap:
rotated_points = np.dot(pointsf, rots).transpose(1, 0, 2)
OR
rotated_points = np.swapaxes(np.dot(pointsf, rots), 0, 1)
I would suggest, however, that you make rots be the inverse (transposed) rotation matrices from what you had before. In that case, you can just compute:
rotated_points = np.dot(transposed_rots, pointsf.T)
You should be able to convert np.dot to torch.mm fairly trivially.

Is there a way to stack multiple 2D (numpy) image arrays about a specified point?

I need to stack many images that are represented by 2D numpy arrays of the same shape (i.e., take the sum or the median of them all). However, as I stack them, they need to be aligned properly -- each image, while the same shape, is all black with a small circular object around the center, but not exactly at the center. I can find the coordinates of the centroid for each image (using the module SourceProperties.centroid through the package photutils), but these coordinates will be different for each image -- they are also subpixel coordinates (example: (y, x) = (203.018, 207.397)).
I do not know of a way to simply move the objects to the center of the arrays, given the centroids have subpixel coordinates, so it seems like it would be more straightforward if there was a way to align each one by their unique centroid coordinates as I stack them... in other words:
import numpy as np
# First image = array1, shape = (400, 400)
centroid1 = (203.018, 207.397)
# Second image = array2, shape = (400, 400)
centroid2 = (205.256, 199.312)
array_list = [array1, array2]
>>> stacked = np.median(array_list, axis=0) # but while setting centroid1 = centroid2 so that the two centroid points exactly overlap while computing median
But I'm not really sure how this would look in code. Is this possible?
Step 1: ignore the subpixel/fractional part, as it makes no sense for arrays. An array cannot be shifted by 0.34 elements to the right.
Step 2: roll arrays to place the centroids consistently.
Step 3: stack them.
As illustrated by the code below, which places centroids in the geometric center of the array.
centroid1 = (203.018, 207.397)
centroid2 = (205.256, 199.312)
centroid1 = np.round(centroid1).astype(np.int)
centroid2 = np.round(centroid2).astype(np.int)
center = np.array(array1.shape)//2
array1_rolled = np.roll(array1, center-centroid1, (0, 1))
array2_rolled = np.roll(array2, center-centroid2, (0, 1))
array_list = [array1_rolled, array2_rolled]
stacked = np.median(array_list, axis=0)

How to normalize a 4D numpy array?

I have a three dimensional numpy array of images (CIFAR-10 dataset). The image array shape is like below:
a = np.random.rand(32, 32, 3)
Before I do any deep learning, I want to normalize the data to get better result. With a 1D array, I know we can do min max normalization like this:
v = np.random.rand(6)
(v - v.min())/(v.max() - v.min())
Out[68]:
array([ 0.89502294, 0. , 1. , 0.65069468, 0.63657915,
0.08932196])
However, when it comes to a 3D array, I am totally lost. Specifically, I have the following questions:
Along which axis do we take the min and max?
How do we implement this with the 3D array?
I appreciate your help!
EDIT:
It turns out I need to work with a 4D Numpy array with shape (202, 32, 32, 3), so the first dimension would be the index for the image, and the last 3 dimensions are the actual image. It'll be great if someone can provide me with the code to normalize such a 4D array. Thanks!
EDIT 2:
Thanks to #Eric's code below, I've figured it out:
x_min = x.min(axis=(1, 2), keepdims=True)
x_max = x.max(axis=(1, 2), keepdims=True)
x = (x - x_min)/(x_max-x_min)
Assuming you're working with image data of shape (W, H, 3), you should probably normalize over each channel (axis=2) separately, as mentioned in the other answer.
You can do this with:
# keepdims makes the result shape (1, 1, 3) instead of (3,). This doesn't matter here, but
# would matter if you wanted to normalize over a different axis.
v_min = v.min(axis=(0, 1), keepdims=True)
v_max = v.max(axis=(0, 1), keepdims=True)
(v - v_min)/(v_max - v_min)
Along which axis do we take the min and max?
To answer this we probably need more information about your data, but in general, when discussing 3 channel images for example, we would normalize using the per-channel min and max. this means that we would perform the normalization 3 times - once per channel.
Here's an example:
img = numpy.random.randint(0, 100, size=(10, 10, 3)) # Generating some random numbers
img = img.astype(numpy.float32) # converting array of ints to floats
img_a = img[:, :, 0]
img_b = img[:, :, 1]
img_c = img[:, :, 2] # Extracting single channels from 3 channel image
# The above code could also be replaced with cv2.split(img) << which will return 3 numpy arrays (using opencv)
# normalizing per channel data:
img_a = (img_a - numpy.min(img_a)) / (numpy.max(img_a) - numpy.min(img_a))
img_b = (img_b - numpy.min(img_b)) / (numpy.max(img_b) - numpy.min(img_b))
img_c = (img_c - numpy.min(img_c)) / (numpy.max(img_c) - numpy.min(img_c))
# putting the 3 channels back together:
img_norm = numpy.empty((10, 10, 3), dtype=numpy.float32)
img_norm[:, :, 0] = img_a
img_norm[:, :, 1] = img_b
img_norm[:, :, 2] = img_c
Edit: It just occurred to me that once you have the one channel data (32x32 image for instance) you can simply use:
from sklearn.preprocessing import normalize
img_a_norm = normalize(img_a)
How do we work with the 3D array?
Well, this is a bit of a big question. If you need functions like array-wise min and max I would use the Numpy versions. Indexing, for instance, is achieved through axis-wide separators - as you can see from my example above.
Also, please refer to Numpy's documentation of ndarray # https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html
to learn more. they really have an amazing set of tools for n-dimensional arrays.
There are different approaches here. You can either decide to normalize over the whole batch of images or normalize per single image. To do that you can either use the mean of a single image or use the mean of the whole batch of images or use a fixed mean from another dataset - e.g. you can use the ImageNet mean value.
If you want to do the same as Tensorflow's tf.image.per_image_standardization you should normalize per single image with the mean of this image. So you loop through all images and do the normalization for all axes in a single image like this:
import math
import numpy as np
from PIL import Image
# open images
image_1 = Image.open("your_image_1.jpg")
image_2 = Image.open("your_image_2.jpg")
images = [image_1, image_2]
images = np.array(images)
standardized_images = []
# standardize images
for image in images:
mean = image.mean()
stddev = image.std()
adjusted_stddev = max(stddev, 1.0/math.sqrt(image.size))
standardized_image = (image - mean) / adjusted_stddev
standardized_images.append(standardized_image)
standardized_images = np.array(standardized_images)

How to apply calculations to elements of a multidimensional matrix in Tensorflow?

I'm relatively new to Python and even more so to Tensorflow so I've been working through some tutorials such as this tutorial. A challenge given was to make an image greyscale. One approach taken here is to just take one colour channel value and duplicate it across all channels. Another is to take an average which can be achieved using tf.reduce_mean as done here. However there are many ways to make an image monochromatic as anyone who has played with GIMP or Photoshop will know. One standard method defined adjusts for the way humans perceive colour and requires that the three colour channels are individually adjusted this way:
Grey = (Red * 0.2126 + Green * 0.7152 + Blue * 0.0722)
Anyway I've achieved it by doing this:
import tensorflow as tf
import numpy as np
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
filename = "MarshOrchid.jpg"
raw_image_data = mpimg.imread(filename)
image = tf.placeholder("float", [None, None, 3])
r = tf.slice(image,[0,0,0],[-1,-1,1])
g = tf.slice(image,[0,0,1],[-1,-1,1])
b = tf.slice(image,[0,0,2],[-1,-1,1])
r = tf.scalar_mul(0.2126,r)
g = tf.scalar_mul(0.7152,g)
b = tf.scalar_mul(0.0722,b)
grey = tf.add(r,tf.add(g,b))
out = tf.concat(2, [grey, grey, grey])
out = tf.cast(out, tf.uint8)
with tf.Session() as session:
result = session.run(out, feed_dict={image: raw_image_data})
plt.imshow(result)
plt.show()
This to me seems hugely inelegant having to cut up the data and apply calculations and then recombine them. A matrix multiplication on individual RGB tuples would be efficient or barring that a function that takes an individual RGB tuple and returns a greyscaled tuple. I've looked at tf.map_fn but can't seem to make it work for this.
Any suggestions or improvements?
How about this?
img = tf.ones([100, 100, 3])
r, g, b = tf.unstack(img, axis=2)
grey = r * 0.2126 + g * 0.7152 + b * 0.0722
out = tf.stack([grey, grey, grey], axis=2)
out = tf.cast(out, tf.uint8)
sample of map_fn
shape of x is (2, 4), so shape of elms_fn is (4,)
if shape of x is (100, 100, 3), shape of elms_fn will be (100, 3)
x = tf.constant([[1, 2, 3, 4],
[5, 6, 7, 8]], dtype=tf.float32)
def avg_fc(elms_fn):
# shape of elms_fn is (4,)
# compute average for each row and return it
avg = tf.reduce_mean(elms_fn)
return avg
# map_fn will stack avg at axis 0
res = tf.map_fn(avg_fc, x)
with tf.Session() as sess:
a = sess.run(res) #[2.5, 6.5]
So having really looked in to this topic, in the current release of tensorflow (r0.12) there doesn't appear to be a simple way to apply custom functions to tuples of values, especially if the result does not effect a reduce. As my initial effort and that of the answer from #xxi you pretty much have to dis-aggregate the tuples before applying a function to them collectively.
I figured out another way to get the result that I wanted without slicing or unstacking but instead reshaping and matrix multiplication which is:
import tensorflow as tf
import numpy as np
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
filename = "MarshOrchid.jpg"
raw_image_data = mpimg.imread(filename)
image = tf.placeholder("float", [None, None, 3])
out = tf.reshape(image, [-1,3])
out = tf.matmul(out,[[0.2126, 0, 0], [0, 0.7152, 0], [0, 0, 0.0722]])
out = tf.reduce_sum(out, 1, keep_dims=True)
out = tf.concat(1, [out, out, out])
out = tf.reshape(out, tf.shape(image))
out = tf.cast(out, tf.uint8)
with tf.Session() as session:
result = session.run(out, feed_dict={image: raw_image_data})
plt.imshow(result)
plt.show()
This worked for the narrow purpose of greyscaling an image but doesn't really give a design pattern to apply for dealing with more generic calculations.
Out of curiosity I profiled these three methods in terms of execution time and memory usage. So which was better?
Method 1 - Slicing: 1.6 seconds & 1.0 GiB memory usage
Method 2 - Unstacking: 1.6 seconds & 1.1 GiB memory usage
Method 3 - Reshape: 1.4 seconds & 1.2 GiB memory usage
So no major differences in performance but interesting nonetheless.
In case you were wondering why the process is so slow, the image used is 5528 x 3685 pixels. But yeah still pretty slow compared to Gimp and others.

Categories