Suppose that I have an input x of size [H,W] and also a mu_x and mu_y (which may be fractional)representing the pixels in x and y direction to shift. Is there any efficient way in pytorch without using c++ to shift the tensor x for mu_x and mu_y units with bilinear interpolation.
To be more precise, let's say we have an image. mu_x = 5 and mu_y = 3, we may want to shift the image so that the image moves rightward 5 pixels and downward 3 pixels, with the pixels out of boundary of [H,W] removed and new pixels introduced at the other end of the boundary to be 0. However, with fractional mu_x and mu_y, we need to use bilinear interpolation to estimate the resulting image.
Is it possible to be implemented with pure pytorch tensor operations? Or do I need to use c++.
I believe you can achieve this by applying grid sampling on your original input and using a grid to guide the sampling process. If you take a coordinate grid of your image and sample using that the resulting image will be equal to the original image. However you can apply a shift on this grid and therefore sample with the given shift. Grid sampling works with floating-point grids of course, which means you can apply an arbitrary non-round shift to your image and choose a sampling mode (bilinear is the default).
This can be implemented out of the box with F.grid_sampling. Given an image tensor img, we first construct a pixel grid of that image using torch.meshgrid. Keep in mind the grid used by the sampler must be normalized to [-1, -1]. Therefore pixel x=0,y=0 should be mapped to (-1,-1), pixel x=w,y=h mapped to (1,1), and the center pixel will end up at around (0,0).
Use two torch.arange with a [0,1]-normalization followed by a remapping to [-1,1]:
>>> c,h,w = img.shape
>>> x, y = torch.arange(h)/(h-1), torch.arange(w)/(w-1)
>>> grid = torch.dstack(torch.meshgrid(x, y))*2-1
So the resulting grid has a shape of (c, h, w) which will be the dimensions of the output image produced by the sampling process.
Since we are not working with batched elements, we need to unsqueeze singleton dimensions on both img and grid. Then we can apply F.grid_sample:
>>> sampled = F.grid_sample(img[None], grid[None])
Following this you can apply your arbitrary mu_x, mu_y shift and even easily use this to batches of images and shifts. The way you would define your sampling is by defining a shifted grid:
>>> x_s, y_s = (torch.arange(h)+mu_y)/(h-1), (torch.arange(w)+mu_x)/(w-1)
Where mu_x and mu_y are the values in pixels (floating point) with wish which the image is shifted on the horizontal and vertical axes respectively. To acquire the sampled image, apply F.grid_sampling on a grid made up of x_s and y_s:
>>> grid_shifted = torch.dstack(torch.meshgrid(x_s, y_s))*2-1
>>> sampled = F.grid_sample(img[None], grid_shifted[None])
Related
I am trying to associate rgb values to pixel coordinates after having done a perspective projection. The equation for the perspective projection is:
where x, y, are the pixel locations of the point, X, Y, and Z are locations of points in the camera frame, and the other parameters denote the intrinsic camera parameters. Given a point cloud containing the point locations and rgb values, I would like to associate rgb values to pixel locations according to the perspective projection.
The following code should create the correct image:
import matplotlib.pyplot as plt
import open3d as o3d
import numpy as np
cx = 325.5;
cy = 253.5;
fx = 518.0;
fy = 519.0;
K = np.array([[fx, 0, cx], [0, fy, cy], [0, 0, 1]])
pcd = o3d.io.read_point_cloud('freiburg.pcd', remove_nan_points=True)
points = np.array(pcd.points)
colors = np.array(pcd.colors)
projection = (K # points.T).T
normalization = projection / projection[:, [2]] #last elemet must be 1
pixel_coordinates = normalization.astype(int)
img = np.zeros((480, 640, 3))
#how can I fill the img appropriately? The matrix pixel coordinates should
# inform about where to place the color intensities.
for position, intensity in zip(pixel_coordinates, colors):
row, column = position[0], position[1]
#img[row, column, :] = intensity # returns with error
img[column, row, :] = intensity # gives a strange picture.
The point cloud can be read here. I expect to be able to associate the rgb values in the last loop:
for position, intensity in zip(pixel_coordinates, colors):
row, column = position[0], position[1]
#img[row, column, :] = intensity # returns with error
img[column, row, :] = intensity # gives a strange picture.
Strangely, if the second-to-last line is not commented, the program returns and IndexError while attempting to write a rgb values outside the range of available columns. The last line in the loop runs however without problems. The generated picture and the correct picture can be seen below:
How can I modify the code above to obtain the correct image?
A couple of issues:
You are ignoring the nonlinear distortion in the projection. Are the images you are comparing to undistorted? If they are, are you sure your projection matrix K is the one associated to the undistorted image?
Projecting the 3D points will inevitably produce a point cloud on the image plane, not a continuous image. To produce an image somewhat natural you likely need to interpolate nearby samples in the 2D point cloud. Your choice of interpolation filter determines the quality of the result. For example, you could first make an image of rgb buckets, a similar image of weights, project the 3d points, place their rgb values in the closest bucket (the one obtained by rounding the projection x,y coords), with a weight equal to the reciprocal of the distance of the projection from the bucket's center (i.e. the reciprocal of the euclidean norm of the rounding residuals). You then first compute the output pixel values as weighted averages at each bucket and then, if there are any unfilled bucket, you fill them by (say) bilinear interpolation of the filled neighbors. The last step will fill 1-pixel holes surrounded by already filled values. For larger holes you will need to choose some kind of infill procedure.
We need to detect whether the images produced by our tunable lens are blurred or not.
We want to find a proxy measure for blurriness.
My current thinking is to first apply Sobel along the x direction because the jumps or the stripes are mostly along this direction. Then computing the x direction marginal means and finally compute the standard deviation of these marginal means.
We expect this Std is bigger for a clear image and smaller for a blurred one because clear images shall have a large intensity or more bigger jumps of pixel values.
But we get the opposite results. How could we improve this blurriness measure?
def sobel_image_central_std(PATH):
# use the blue channel
img = cv2.imread(PATH)[:,:,0]
# extract the central part of the image
hh, ww = img.shape
hh2 = hh // 2
ww2 = ww// 2
hh4 = hh // 4
ww4 = hh //4
img_center = img[hh4:(hh2+hh4), ww4:(ww2+ww4)]
# Sobel operator
sobelx = cv2.Sobel(img_center, cv2.CV_64F, 1, 0, ksize=3)
x_marginal = sobelx.mean(axis = 0)
plt.plot(x_marginal)
return(x_marginal.std())
Blur #1
Blur #2
Clear #1
Clear #2
In general:
Is there a way to detect if an image is blurry?
You can combine calculation this with your other question where you are searching for the central angle.
Once you have the angle (and the center, maybe outside of the image) you can make an axis transformation to remove the circular component of the cone. Instead you get x (radius) and y (angle) where y would run along the circular arcs.
Maybe you can get the center of the image from the camera set-up.
Then you don't need to calculate it using the intersection of the edges from the central angle. Or just do it manually once if it is fixed for all images.
Look at polar coordinate systems.
Due to the shape of the cone the image will be more dense at the peak but this should be a fixed factor. But this will probably bias the result when calculation the blurriness along the transformed image.
So what you could to correct this is create a synthetic cone image with circular lines and do the transformation on it. Again, requires some try-and-error.
But it should deliver some mask that you could use to correct the "blurriness bias".
I have several images which are fully overlapping the same scene. But there is a small shift between all the images, something like 1px or less, so a sub-pixel shift. Let's say it's the problem (1): how can I estimate this sub-pixel shift between 2 images (actually, I know how and I write the code about this below). I used python here.
In addition to problem (1), there is the problem (2) which is about a non uniform shift on the full image. Let's give image A and image B, on the top left, image A is shifted about 1px from image B in x and y axis but, on the center, image A is shifted in 0.5px from image B also in x and y axis. The shift between image A and B is not uniform among the total surface of the image. The problem is how can I estimate this non-uniform shift, let's nammed it a shifted surface, for all the pixels for all images (taking one as reference) (I have a solution also for this question, I will explain it below).
Finally, problem (3) is about to shift the image with the estimated shift surface (calculated on (2)). I know how to shift an image to 0.5 px on X-axis and 1.2 px on Y-axis for example. But I don't know how to shift an array with a specific shift for each pixel.
My solutions:
Problem (1): This problem can be solved using a cross-correlation in fourrier space. A function already existe in the scipy library : register_translation reference here, I just need to give two images as parameters and the float precision I want.
Problem (2): Remember, the shift is not uniform among all the surface of the image. What I did, is basically, on window of 500x500 px, the shift is uniform and can be easily estimated from problem (1). So, I calculated the shift among all the surface of the image with window of 500x500px and a step of 100px. I have thus now, the non uniform shift estimated as shown below . I can then, interpolate a surface from this ponctual estimated shift, which will give me an estimated shift for each pixel of the image. To do that, I have to interpolate a surface with the same resolution of the image. I did it, using numpy.griddata. Here is the result for both components (x and y) . I have thus, estimated the non uniform shift among all the surface of the image.
Problem (3): I now want to apply this shift to all the image. I don't know how to do that. To shift an image at sub-pixel, you can use a function from scipy.ndimage nammed fourier_shift that you can find here but you can only give a single shift for all the image. Here, I want to give a shift for each pixel of the image.
Do you guys have any ideas to solve the problem (3) ? Also, if you think that there is a simpliest way to solve Problem 1 and 3, it can still be usefull ! For information, I have 7 images of 16000x26000px, so it take some time to solve Problem (2) as I do.
You now need to interpolate the original image at locations (x + x_shift(x,y), y + y_shift(x,y)). Likely scipy.interpolate.interpn is the most efficient way to do this.
I think your code would look something like this (not tested):
import numpy as np
import spicy
# ... (load data, find shifts, etc.)
input_coords = (np.arange(x_size), np.arange(y_size))
output_coords = np.column_stack((
( x_shift + input_coords[0] ).ravel(),
( y_shift + input_coords[1][None,:] ).ravel() ))
output_image = scipy.interpolate.interpn(input_coords, original_image, output_coords,
method='linear', bounds_error=False)
I want to apply Otsu thresholding to image gradients (to remove noise). After that, I want to compute the gradients orientation. Unfortunately, when I do so, I only get gradient orientations between 0 and 90 degrees. Without Otsu thresholding, the values are between 0 and 360.
See my code in Python
import numpy as np
import cv2
img = cv2.imread('Ob.png',cv2.IMREAD_GRAYSCALE)
img = img.astype('float32')
img2 =
dst1 = cv2.Sobel(img,cv2.CV_64F,1,0,ksize=5)
dst2 = cv2.Sobel(img,cv2.CV_64F,0,1,ksize=5)
ret1,th1 = cv2.threshold(dst1.astype(np.uint8),0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
ret2,th2 = cv2.threshold(dst2.astype(np.uint8),0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
mag, ang = cv2.cartToPolar(dst1.astype(np.float32),dst2.astype(np.float32))
np.rad2deg(ang)
What is happening in your code is quite simple to explain:
dst1 and dst2, the output of the two Sobel filters, are the x and y components of the gradient vector. For one given pixel, the gradient vector is given by (dst1[i,j], dst2[i,j]). This vector can have any values, for example (5.8,-2.1), leading to an angle of about 340 degrees.
Next, you threshold these two images. Otsu thresholding will find a value for which the image is nicely separated into pixels of low intensity and pixels of high intensity. These are assigned values of 0 and 255, respectively. But first, you convert the floating-point images to uint8, setting all negative values to 0. So, our vector (5.8,-2.1) is first converted to (5,0), and then thresholded, after which it becomes either (255,0) or (0,0) depending on what side of the threshold the 5 falls.
Thus, we have converted the vector with an angle of 340 degrees to one with an angle of 0 degrees or no computable angle (though atan2(0,0) typically yields 0 also).
In fact, all vectors have become either (0,0), (0,255), (255,0) or (255,255), meaning that you will only find angles of 0, 45 and 90 degrees.
What you should do instead is compute the magnitude, and threshold that (I don't know if Otsu is the ideal method for such an image). Next, use only the angle for those pixels where the magnitude is above the threshold.
Another common alternative is to use Gaussian gradients instead of Sobel. There, you can set a smoothing (regularization) parameter, which allows you to remove more or less noise. I often see this implemented as a Gaussian blur followed by the Sobel filters, though it makes more sense to me to directly use Gaussian derivative filters.
If I may why the first thing you do is to convert the data to float32 ?
I think it would be more efficient to just let it does during the Sobel processing.
That is just my point of view.
The thing you named "noise" as result of the gradient filter is actually called non maxima.
Oftenly algorithm such as Canny does consist to threshold it after the Sobel filtering.
The inconvenient with this approach is to find the appropriate thresholds.
Personally I use the non maxima suppression of another algorithm.
Your code would become:
import numpy as np
import cv2
img = cv2.imread('Ob.png',cv2.IMREAD_GRAYSCALE)
dx,dy = cv2.spatialGradient(img,ksize=5)
mag = cv2.magnitude(dx.astype(np.float32),dy.astype(np.float32))
se = cv2.ximgproc_StructuredEdgeDetection()
ori = se.computeOrientation(mag)
edges_without_nms = se.edgesNms(mag,ori)
I hope it helps you.
I have an image i.e an array of pixel values, lets say 5000x5000 (this is the typical size). Now I want to expand it by 2 times to 10kx10k. The value of (0,0) pixel value goes to (0,0), (0,1), (1,0), (1,1) in the expanded image.
After that I am rotating the expanded image using scipy.interpolate.rotate (I believe there is no faster way than this given the size of my array)
Next I have to again resize this 10kx10k array to original size i.e. 5kx5k. To do this I have to take the average pixel values of (0,0), (0,1), (1,0), (1,1) in the expanded image and put them in (0,0) of the new image.
However it turns out that this whole thing is an expensive procedure an takes a lot of time given the size of my array. Is there a faster way to do it?
I am using the following code to expand the original image
#Assume the original image is already given
largeImg=np.zeros((10000,10000), dtype=np.float32)
for j in range(5000):
for k in range(5000):
pixel_value=original_img[j][k]
for x in range((2*k), (2*(k+1))):
for y in range((2*j), (2*(j+1))):
largeImg[y][x] = pixel_value
A similar method is used to reduce the image to original size after rotation.
In numpy you can use repeat:
large_img = original_img.repeat(2, axis=1).repeat(2, axis=0)
and
final_img = 0.25 * rotated_img.reshape(5000,2,5000,2).sum(axis=(3,1))
or use scipy.ndimage.zoom. this can give you smoother results than the numpy methods.
there is a nice library that probably has all the functions you need for handling images, including rotate:
http://scikit-image.org/docs/dev/api/skimage.transform.html#skimage.transform.rotate