I have the following two tensors:
img is a RGB image of shape (224,224,3)
uvs is a tensor with same spacial size e.g. (224, 224, 2) that maps to coordinates (x,y). In other words it provides (x,y) coordinates for every pixel of the input image.
I want to create now a new output image tensor that contains on index (x,y) the value of the input image. So the output should be an image as well with the pixels rearranged according to the mapping tensor.
Small toy example:
img = [[c1,c2], [c3, c4]] where c is a RGB color [r, g, b]
uvs = [[[0,0], [1,1]],[[0,1], [1,0]]]
out = [[c1, c3], [c4, c2]]
How would one achieve such a thing in pytorch in a fast vectorized manner?
Try with:
out = img[idx[...,0], idx[...,1]]
I was able to solve it (with the help of Quang Hoang answer)
out[idx[...,0], idx[...,1]] = img
What you need is torch.nn.functional.grid_sample(). You can do something like this:
width, height, channels = (224, 224, 3)
# Note that the image is channel-first (CHW format). In this example, I'm using a float image, so the values must be in the range (0, 1).
img = torch.rand((channels, height, width))
# Create the indices of shape (224, 224, 2). Any other size would work too.
col_indices = torch.arange(width, dtype=torch.float32)
row_indices = torch.arange(height, dtype=torch.float32)
uvs = torch.stack(torch.meshgrid([col_indices, row_indices]), dim=-1)
# Transform the indices from pixel coordiantes to the to the range [-1, 1] such that:
# * top-left corner of the input = (-1, -1)
# * bottom-right corner of the input = (1, 1)
# This is required for grid_sample() to work properly.
uvs[..., 0] = (uvs[..., 0] / width) * 2 - 1
uvs[..., 1] = (uvs[..., 1] / height)* 2 - 1
# Do the "mapping" operation (this does a bilinear interpolation) using `uvs` coordinates.
# Note that grid_sample() requires a batch dimension, so need to use `unsqueeze()`, then
# get rid of it using squeeze().
mapped = torch.nn.functional.grid_sample(
img.unsqueeze(0),
uvs.unsqueeze(0),
mode='bilinear',
align_corners=True,
)
# The final image is in HWC format.
result = mapped.squeeze(0).permute(1, 2, 0)
Side note: I found your question by searching for a solution for a related problem I had for a while. While I was writing an answer to you question, I realized what bug was causing the the problem I was facing. By helping you I effectively helped my self, so I hope this helps you! :)
Related
I'm trying to calculate image histograms of an numpy array of images. The array of images is of shape (n_images, width, height, colour_channels) and I want to return an array of shape (n_images, count_in_each_bin (i.e. 255)). This is done via two intermediary steps of averaging each colour channel for each image and then flattening each 2D image to a 1D one.
I think have successfully done this with the code below, however I have cheated a bit with the for loop at the end. My question is this - is there a way of getting rid of the last for loop and using an optimised numpy function instead?
def histogram_helper(flattened_image: np.array) -> np.array:
counts, _ = np.histogram(flattened_image, bins=[n for n in range(0, 256)])
return counts
# Using 10 RGB images of width and height 300
images = np.zeros((10, 300, 300, 3))
# Take the mean of the three colour channels
channel_avg = np.mean(images, axis=3)
# Flatten each image in the array of images, resulting in a 1D representation of each image.
flat_images = channel_avg.reshape(*channel_avg.shape[:-2], -1)
# Now calculate the counts in each of the colour bins for each image in the array.
# This will provide us with a count of how many times each colour appears in an image.
result = np.empty((0, len(self.histogram_bins) - 1), dtype=np.int32)
for image in flat_images:
colour_counts = self.histogram_helper(image)
colour_counts = colour_counts.reshape(1, -1)
result = np.concatenate([result, colour_counts])
You don't necessarily need to call np.histogram or np.bincount for this, since pixel values are in the range 0 to N. That means that you can treat them as indices and simply use a counter.
Here's how I would transform the initial images, which I imaging are of dtype np.uint8:
images = np.random.randint(0, 255, size=(10, 5, 5, 3)) # 10 5x5 images, 3 channels
reshaped = np.round(images.reshape(images.shape[0], -1, images.shape[-1]).mean(-1)).astype(images.dtype)
Now you can simply count the histograms using unbuffered addition with np.add.at:
result = np.zeros((images.shape[0], 256), int)
index = np.arange(len(images))[:, None]
np.add.at(result, (index, reshaped), 1)
The last operation is in-place and therefore returns None, but the answer will be in result nevertheless.
I have the following code portion:
dataset = trainDataset()
train_loader = DataLoader(dataset,batch_size=1,shuffle=True)
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
images = []
image_labels = []
for i, data in enumerate(train_loader,0):
inputs, labels = data
inputs, labels = inputs.to(device), labels.to(device)
inputs, labels = inputs.float(), labels.float()
images.append(inputs)
image_labels.append(labels)
image = images[7]
image = image.numpy()
image = image.reshape(416,416,3)
img = Image.fromarray(image,'RGB')
img.show()
The issue is that the image doesn't display properly. For instance, the dataset I have contains images of cats and dogs. But, the image displayed looks as shown below. Why is that?
EDIT 1
So, after #flawr's nice explanation, I have the following:
image = images[7]
image = image[0,...].permute([1,2,0])
image = image.numpy()
img = Image.fromarray(image,'RGB')
img.show()
And, the image looks as shown below. Not sure if it is a Numpy thing or the way the image is represented and displayed? I would like to also kindly note that I get a different display of the image at every run, but it is pretty much something close to the image displayed below.
EDIT 2
I think the issue now is with how to represent the image. By referring to this solution, I now get the following:
image = images[7]
image = image[0,...].permute([1,2,0])
image = image.numpy()
image = (image * 255).astype(np.uint8)
img = Image.fromarray(image,'RGB')
img.show()
Which produces the following image as expected :-)
In pytorch you usually represent pictures with tensors of shape
(channels, height, width)
You then seem to reshape it to what you expect would be
(height, width, channels)
Note that these tensors or arrays are actually stored as 1d "array", and the multiple dimensions just come from defining strides (check out How to understand numpy strides for layman?).
In your particular case this means that consecutive values (that were basically values of the same color channela and the same row) are now interpreted as different colour channels.
So let's say you have a 2x2 image with 3 color channels. Let's say it is a chessboard pattern. In pytorch that would looks something like the following array of shape (3, 2, 2):
[[[1,0],[0,1]],[[1,0],[0,1]],[[1,0],[0,1]]]
The underlaying internal array is just
[ 1,0 , 0,1 , 1,0 , 0,1 , 1,0 , 0,1 ]
So reshaping to (2, 2, 3) would look like so:
[[[1,0,0],[1,1,0]],[[0,1,1],[0,0,1]]]
which immediately shows how the image will be completely jumbled. Reshaping really just means setting the brackets in different places!
So what you probably want instead of reshape is permute([1, 2, 0]), (or in numpy called transpose) which will actually rearrange the data.
I have an image and I want to split it into multiple images using vertical and horizontal strides like a sliding window and the resultant images will all be of same resolution. How can I do that efficiently in Python? I have done this much:
from PIL import Image
def sliding_window(image, stride, imgSize):
width, height = image.size
img = []
for y in range(0, height-imgSize, stride):
for x in range(0, width-imgSize, stride):
# Setting the points for cropped image
left = x
top = y
right = x + imgSize
bottom = y + imgSize
im1 = image.crop((left, top, right, bottom))
img.append(im1)
return img
file = "/home/xxxxxx/yyyyyy.png"
im = Image.open(file)
img = sliding_window(im, 1, 838) # Strides of 1 takes too much time
but this code requires too much RAM and is too time consuming. Please help.
Example :
Sample code : img = sliding_window(im, 200, 300)
The following image is of 800*800 size.
Output :
As you correcly surmised, there is a way to do this with windows that view the original data without copying it. The simplest way is probably to use the relatively new sliding_window_view function:
from numpy.lib.stride_tricks import sliding_window_view
window = sliding_window_view(image, (838, 838), axis=(0, 1))
You don't need an explicit axis for 2D images, but it doesn't hurt and saves you some trouble in the 3D case. If you wanted to adjust the strides, you can just subset the result. For example, for a stride of (3, 4):
window = window[::3, ::4]
Since the window axes must (should) come last in C order, 3D images will have the channels moved to the middle axis. To access the correct shape, you can use something like np.moveaxis or transpose:
np.moveaxis(window[80, 70], 0, -1)
OR
window[80, 70].transpose(1, 2, 0).shape
I have a set of 3D data (Volume of MRI) .nii images with the shape 98-by-240-by-342 (98:slices, 240:W and 342: H), for example. The sizes of volumes are varying from each other. I want to do center-cropping of the all volumes in a way that if width or height is less than 256, that dimension is padded with zeros. I know this can be done by applying on each slice separately, however, I am asking whether if there is a medical image analysis tool that can crop width and heights in a volume?
Thanks
ITK, an n-dimensional library can fulfill your needs. It has pad and a crop filters. If it is not clear how to use it, you can take a look at documentation.
I found an easy way of center cropping in SO, which is applicable in N-dimentional arrays. #Losses Don response, which is an smart way of center cropping. The padding part, I added my self.
def cropND(img, bounding):
start = tuple(map(lambda a, da: a//2-da//2, img.shape, bounding))
end = tuple(map(operator.add, start, bounding))
slices = tuple(map(slice, start, end))
return img[slices]
You might check this code:
def resize_image_with_crop_or_pad(image, img_size=(64, 64, 64), **kwargs):
"""Image resizing. Resizes image by cropping or padding dimension
to fit specified size.
Args:
image (np.ndarray): image to be resized
img_size (list or tuple): new image size
kwargs (): additional arguments to be passed to np.pad
Returns:
np.ndarray: resized image
"""
assert isinstance(image, (np.ndarray, np.generic))
assert (image.ndim - 1 == len(img_size) or image.ndim == len(img_size)), \
'Example size doesnt fit image size'
# Get the image dimensionality
rank = len(img_size)
# Create placeholders for the new shape
from_indices = [[0, image.shape[dim]] for dim in range(rank)]
to_padding = [[0, 0] for dim in range(rank)]
slicer = [slice(None)] * rank
# For each dimensions find whether it is supposed to be cropped or padded
for i in range(rank):
if image.shape[i] < img_size[i]:
to_padding[i][0] = (img_size[i] - image.shape[i]) // 2
to_padding[i][1] = img_size[i] - image.shape[i] - to_padding[i][0]
else:
from_indices[i][0] = int(np.floor((image.shape[i] - img_size[i]) / 2.))
from_indices[i][1] = from_indices[i][0] + img_size[i]
# Create slicer object to crop or leave each dimension
slicer[i] = slice(from_indices[i][0], from_indices[i][1])
# Pad the cropped image to extend the missing dimension
return np.pad(image[slicer], to_padding, **kwargs)
source: Usefull Python codes for MRI images
I am trying to read images from an lmdb dataset, augment each one and then save them into another dataset for being used in my trainings.
These images axis were initially changed to (3,32,32) when they were being saved into the lmdb dataset, So in order to augment them I had to transpose them back into their actual shape.
The problem is whenever I try to display them using either matplotlib's show() method or scipy's toimage(), they show a rotated version of the image.
So we have :
img_set = np.transpose(data_train,(0,3,2,1))
#trying to display an image using pyplot, makes it look like this:
plt.subplot(1,2,1)
plt.imshow(img_set[0])
showing the same image using toimage :
Now if I dont transpose data_train, pyplot's show() generates an error while
toimage() displays the image well:
What is happening here?
When I feed the transposed data_train to my augmenter, I also get the result rotated just like previous examples.
Now I'm not sure whether this is a displaying issue, or the actual images are indeed rotated!
What should I do ?
First, look closely. The transoposed array is not rotated but mirrored on the diagonal (i.e. X and Y axes are swapped).
The original shape is (3,32,32), which I interpret as (RGB, X, Y). However, imshow expects an array of shape MxNx3 - the color information must be in the last dimension.
By transposing the array you invert the order of dimensions: (RGB, X, Y) becomes (Y, X, RGB). This is fine for matplotlib because the color information is now in the last dimension but X and Y are swapped, too. If you want to preserve the order of X, Y you can tell transpose to do so:
import numpy as np
img = np.zeros((3, 32, 64)) # non-square image for illustration
print(img.shape) # (3, 32, 64)
print(np.transpose(img).shape) # (64, 32, 3)
print(np.transpose(img, [1, 2, 0]).shape) # (32, 64, 3)
When using imshow to display an image be aware of the following pitfalls:
It treats the image as a matrix, so the dimensions of the array are interpreted as (ROW, COLUMN, RGB), which is equivalent to (VERTICAL, HORIZONTAL, COLOR) or (Y, X, RGB).
It changes direction of the y axis so the upper left corner is img[0, 0]. This is different from matplotlib's normal coordinate system where (0, 0) is the bottom left.
Example:
import matplotlib.pyplot as plt
img = np.zeros((32, 64, 3))
img[1, 1] = [1, 1, 1] # marking the upper right corner white
plt.imshow(img)
Note that the smaller first dimension corresponds to the vertical direction of the image.