Related
I am working on a transmission line following algorithm using quadcopters. To do so, I need to calculate the line position on the image I receive from the UAV in order to determine a pitch velocity so the lines can be kept at the center of the image. The problem is that, when I apply a velocity on x-axis to move the UAV to the desired setpoint (left/right moviment), the image plane tilts along with the UAV which increases the positional error incorrectly. The images below exemplify the issue.
I tried something similar to this post since the UAV euler angles is known. This approach reduced the distortion caused by the frame tilting, but I couldn't eliminate it.
Transform a frame to be as if it was taken from above using OpenCV
The code:
f = 692.81 # Focal Length
# Frame Shape
cx = width
cy = height
#Euller Angles
roll_ = 0
pitch_ = pitch
yaw_ = 0
dx = 0
dy = 0
dz = 1
A2 = np.array([[f,0,cx,0],[0,f, cy,0],[0,0,1,0]])
A1 = np.array([[1/f,0,-cx/f],[0,1/f,-cy/f],[0,0,0],[0,0,dz]])
RX = np.array([[1,0,0,0],[0,np.cos(roll_),-(np.sin(roll_)),0],[0, np.sin(roll_), np.cos(roll_),0],[0,0,0,1]])
RY = np.array([[np.cos(pitch_), 0, -np.sin(pitch_),0],[0,1,0,0],[(np.sin(pitch_)), 0, np.cos(pitch_),0],[0,0,0,1]])
RZ = np.array([[np.cos(yaw_), -(np.sin(yaw_)), 0,0],[np.sin(yaw_), np.cos(yaw_), 0,0],[0,0,1,0],[0,0,0,1]])
T = np.array([[1, 0, 0, dx],[0, 1, 0, dy],[0, 0, 1, dz],[0, 0, 0, 1]])
R = np.dot(np.dot(RX, RY), RZ)
H = np.dot(A2, np.dot(T, np.dot(R, A1)))
#The output frame
linha_bw = cv2.warpPerspective(linha_bw, H,(frame.shape[1],frame.shape[0]),None,cv2.INTER_LINEAR)
The results from this transformation can be seen on the graph below. The blue curve is the controller without the image rectification, while the red one is the controller with the code above.
I'm not sure if there is mistakes on my code or there is a better approach to solve my problem through image processing techniques. Any help is highly appreciated !
from scipy import ndimage
height, width, colors = image.shape
transform = [[1, 0, 0],
[0.5, 1, 0],
[0, 0, 1]]
sheared_array = ndimage.affine_transform(image,
transform,
offset=(0, -height*0.7, 0),
output_shape=(height, width*2, colors))
plt.imshow(sheared_array)
My current code does this. My aim is to shear the image by any degree X.
I want to do the same thing with a naive approach. As in, without any pre-defined functions. Just python/numpy code from scratch.
Given the image:
the following code should do what you want to achieve. It copies y-rows of pixels from the numpy array representing the source image to a new created wider image at appropriate x-offsets calculated from the given shear angle. The variable names in a following code are chosen in a way explaining what they are used for providing further details about what the code does:
from PIL import Image
import numpy as np
shearAngleDegrees = 30
PILimg = Image.open('shearNumpyImageByAngle.jpg')
#PILimg.show()
npImg = np.asarray(PILimg)
def shearNpImgByAngle(numpyImageArray, shearAngleDegrees, maxShearAngle=75):
import numpy as np
from math import tan, radians
assert -maxShearAngle <= shearAngleDegrees <= maxShearAngle
ccw = True if shearAngleDegrees > 0 else False # shear counter-clockwise?
imgH, imgW, imgRGBtplItems = npImg.shape
shearAngleRadians = radians(shearAngleDegrees)
imgWplus2imgH = abs(tan(shearAngleRadians)) # (plus in width)/(image height)
imgWplus = int((imgH-1)*imgWplus2imgH) # image width increase in pixels
npImgOut = np.zeros((imgH, imgW+imgWplus, imgRGBtplItems), dtype='uint8')
Wplus, Wplus2H = (0, -imgWplus2imgH) if ccw else (imgWplus,imgWplus2imgH)
for y in range(imgH):
shiftX = Wplus-int(y*Wplus2H)
npImgOut[y][shiftX:shiftX+imgW] = npImg[y]
return npImgOut
#:def
npImgOut = shearNpImgByAngle(npImg, shearAngleDegrees)
PILout = Image.fromarray(npImgOut)
PILout.show()
PILout.save('shearNumpyImageByAngle_shearedBy30deg.jpg')
gives:
As a nice add-on to the above code an extension filling the black edges of the sheared image mirroring the source picture around its sides:
def filledShearNpImgByAngle(npImg, angleDeg, fill=True, maxAngle=75):
import numpy as np
from math import tan, radians
assert -maxAngle <= angleDeg <= maxAngle
ccw = True if angleDeg > 0 else False # shear counter-clockwise?
imgH, imgW, imgRGBtplItems = npImg.shape
angleRad = radians(angleDeg)
imgWplus2imgH = abs(tan(angleRad)) # (plus in width)/(image height)
imgWplus = int((imgH-1)*imgWplus2imgH) # image add. width in pixels
npImgOut = np.zeros((imgH, imgW+imgWplus, imgRGBtplItems),
dtype=npImg.dtype) # 'uint8')
Wplus, Wplus2H = (0, -imgWplus2imgH) if ccw else (imgWplus, imgWplus2imgH)
for y in range(imgH):
shiftXy = Wplus-int(y*Wplus2H)
npImgOut[y][shiftXy:shiftXy+imgW] = npImg[y]
if fill:
assert imgW > imgWplus
npImgOut[y][0:shiftXy] = np.flip(npImg[y][0:shiftXy], axis=0)
npImgOut[y][imgW+shiftXy:imgW+imgWplus] = np.flip(npImg[y][imgW-imgWplus-1+shiftXy:imgW-1], axis=0)
[imgW-x-2]
return npImgOut
#:def
from PIL import Image
import numpy as np
PILimg = Image.open('shearNumpyImageByAngle.jpg')
npImg = np.asarray(PILimg)
shearAngleDegrees = 20
npImgOut = filledShearNpImgByAngle(npImg, shearAngleDegrees)#, fill=False)
shearAngleDegrees = 10
npImgOut = filledShearNpImgByAngle(npImgOut, shearAngleDegrees)#, fill=False)
PILout = Image.fromarray(npImgOut)
PILout.show()
PILout.save('shearNumpyImageByAngle_filledshearBy30deg.jpg')
gives:
or other way around:
I'd like to randomly rotate an image tensor (B, C, H, W) around it's center (2d rotation I think?). I would like to avoid using NumPy and Kornia, so that I basically only need to import from the torch module. I'm also not using torchvision.transforms, because I need it to be autograd compatible. Essentially I'm trying to create an autograd compatible version of torchvision.transforms.RandomRotation() for visualization techniques like DeepDream (so I need to avoid artifacts as much as possible).
import torch
import math
import random
import torchvision.transforms as transforms
from PIL import Image
# Load image
def preprocess_simple(image_name, image_size):
Loader = transforms.Compose([transforms.Resize(image_size), transforms.ToTensor()])
image = Image.open(image_name).convert('RGB')
return Loader(image).unsqueeze(0)
# Save image
def deprocess_simple(output_tensor, output_name):
output_tensor.clamp_(0, 1)
Image2PIL = transforms.ToPILImage()
image = Image2PIL(output_tensor.squeeze(0))
image.save(output_name)
# Somehow rotate tensor around it's center
def rotate_tensor(tensor, radians):
...
return rotated_tensor
# Get a random angle within a specified range
r_degrees = 5
angle_range = list(range(-r_degrees, r_degrees))
n = random.randint(angle_range[0], angle_range[len(angle_range)-1])
# Convert angle from degrees to radians
ang_rad = angle * math.pi / 180
# test_tensor = preprocess_simple('path/to/file', (512,512))
test_tensor = torch.randn(1,3,512,512)
# Rotate input tensor somehow
output_tensor = rotate_tensor(test_tensor, ang_rad)
# Optionally use this to check rotated image
# deprocess_simple(output_tensor, 'rotated_image.jpg')
Some example outputs of what I'm trying to accomplish:
So the grid generator and the sampler are sub-modules of the Spatial Transformer (JADERBERG, Max, et al.). These sub-modules are not trainable, they let you apply a learnable, as well as non-learnable, spatial transformation.
Here I take these two submodules and use them to rotate an image by theta using PyTorch's functions torch.nn.functional.affine_grid and torch.nn.functional.affine_sample (these functions are implementations of the generator and the sampler, respectively):
import torch
import torch.nn.functional as F
import numpy as np
import matplotlib.pyplot as plt
def get_rot_mat(theta):
theta = torch.tensor(theta)
return torch.tensor([[torch.cos(theta), -torch.sin(theta), 0],
[torch.sin(theta), torch.cos(theta), 0]])
def rot_img(x, theta, dtype):
rot_mat = get_rot_mat(theta)[None, ...].type(dtype).repeat(x.shape[0],1,1)
grid = F.affine_grid(rot_mat, x.size()).type(dtype)
x = F.grid_sample(x, grid)
return x
#Test:
dtype = torch.cuda.FloatTensor if torch.cuda.is_available() else torch.FloatTensor
#im should be a 4D tensor of shape B x C x H x W with type dtype, range [0,255]:
plt.imshow(im.squeeze(0).permute(1,2,0)/255) #To plot it im should be 1 x C x H x W
plt.figure()
#Rotation by np.pi/2 with autograd support:
rotated_im = rot_img(im, np.pi/2, dtype) # Rotate image by 90 degrees.
plt.imshow(rotated_im.squeeze(0).permute(1,2,0)/255)
In the example above, assume we take our image, im, to be a dancing cat in a skirt:
rotated_im will be a 90-degrees CCW rotated dancing cat in a skirt:
And this is what we get if we call rot_img with theta eqauls to np.pi/4:
And the best part that it's differentiable w.r.t the input and has autograd support! Hooray!
With torchvision it should be simple:
import torchvision.transforms.functional as TF
angle = 30
x = torch.randn(1,3,512,512)
out = TF.rotate(x, angle)
For example if x is:
out with a 30 degree rotation is (NOTE: counterclockwise):
There is a pytorch function for that:
x = torch.tensor([[0, 1],
[2, 3]])
x = torch.rot90(x, 1, [0, 1])
>> tensor([[1, 3],
[0, 2]])
Here are the docs: https://pytorch.org/docs/stable/generated/torch.rot90.html
I implemented FFT-based convolution in Pytorch and compared the result with spatial convolution via conv2d() function. The convolution filter used is an average filter. The conv2d() function produced smoothened output due to average filtering as expected but the fft-based convolution returned a more blurry output.
I have attached the code and outputs here -
spatial convolution -
from PIL import Image, ImageOps
import torch
from matplotlib import pyplot as plt
from torchvision.transforms import ToTensor
import torch.nn.functional as F
import numpy as np
im = Image.open("/kaggle/input/tiger.jpg")
im = im.resize((256,256))
gray_im = im.convert('L')
gray_im = ToTensor()(gray_im)
gray_im = gray_im.squeeze()
fil = torch.tensor([[1/9,1/9,1/9],[1/9,1/9,1/9],[1/9,1/9,1/9]])
conv_gray_im = gray_im.unsqueeze(0).unsqueeze(0)
conv_fil = fil.unsqueeze(0).unsqueeze(0)
conv_op = F.conv2d(conv_gray_im,conv_fil)
conv_op = conv_op.squeeze()
plt.figure()
plt.imshow(conv_op, cmap='gray')
FFT-based convolution -
def fftshift(image):
sh = image.shape
x = np.arange(0, sh[2], 1)
y = np.arange(0, sh[3], 1)
xm, ym = np.meshgrid(x,y)
shifter = (-1)**(xm + ym)
shifter = torch.from_numpy(shifter)
return image*shifter
shift_im = fftshift(conv_gray_im)
padded_fil = F.pad(conv_fil, (0, gray_im.shape[0]-fil.shape[0], 0, gray_im.shape[1]-fil.shape[1]))
shift_fil = fftshift(padded_fil)
fft_shift_im = torch.rfft(shift_im, 2, onesided=False)
fft_shift_fil = torch.rfft(shift_fil, 2, onesided=False)
shift_prod = fft_shift_im*fft_shift_fil
shift_fft_conv = fftshift(torch.irfft(shift_prod, 2, onesided=False))
fft_op = shift_fft_conv.squeeze()
plt.figure('shifted fft')
plt.imshow(fft_op, cmap='gray')
original image -
spatial convolution output -
fft-based convolution output -
Could someone kindly explain the issue?
The main problem with your code is that Torch doesn't do complex numbers, the output of its FFT is a 3D array, with the 3rd dimension having two values, one for the real component and one for the imaginary. Consequently, the multiplication does not do a complex multiplication.
There currently is no complex multiplication defined in Torch (see this issue), we'll have to define our own.
A minor issue, but also important if you want to compare the two convolution operations, is the following:
The FFT takes the origin of its input in the first element (top-left pixel for an image). To avoid a shifted output, you need to generate a padded kernel where the origin of the kernel is the top-left pixel. This is quite tricky, actually...
Your current code:
fil = torch.tensor([[1/9,1/9,1/9],[1/9,1/9,1/9],[1/9,1/9,1/9]])
conv_fil = fil.unsqueeze(0).unsqueeze(0)
padded_fil = F.pad(conv_fil, (0, gray_im.shape[0]-fil.shape[0], 0, gray_im.shape[1]-fil.shape[1]))
generates a padded kernel where the origin is in pixel (1,1), rather than (0,0). It needs to be shifted by one pixel in each direction. NumPy has a function roll that is useful for this, I don't know the Torch equivalent (I'm not at all familiar with Torch). This should work:
fil = torch.tensor([[1/9,1/9,1/9],[1/9,1/9,1/9],[1/9,1/9,1/9]])
padded_fil = fil.unsqueeze(0).unsqueeze(0).numpy()
padded_fil = np.pad(padded_fil, ((0, gray_im.shape[0]-fil.shape[0]), (0, gray_im.shape[1]-fil.shape[1])))
padded_fil = np.roll(padded_fil, -1, axis=(0, 1))
padded_fil = torch.from_numpy(padded_fil)
Finally, your fftshift function, applied to the spatial-domain image, causes the frequency-domain image (the result of the FFT applied to the image) to be shifted such that the origin is in the middle of the image, rather than the top-left. This shift is useful when looking at the output of the FFT, but is pointless when computing the convolution.
Putting these things together, the convolution is now:
def complex_multiplication(t1, t2):
real1, imag1 = t1[:,:,0], t1[:,:,1]
real2, imag2 = t2[:,:,0], t2[:,:,1]
return torch.stack([real1 * real2 - imag1 * imag2, real1 * imag2 + imag1 * real2], dim = -1)
fft_im = torch.rfft(gray_im, 2, onesided=False)
fft_fil = torch.rfft(padded_fil, 2, onesided=False)
fft_conv = torch.irfft(complex_multiplication(fft_im, fft_fil), 2, onesided=False)
Note that you can do one-sided FFTs to save a bit of computation time:
fft_im = torch.rfft(gray_im, 2, onesided=True)
fft_fil = torch.rfft(padded_fil, 2, onesided=True)
fft_conv = torch.irfft(complex_multiplication(fft_im, fft_fil), 2, onesided=True, signal_sizes=gray_im.shape)
Here the frequency domain is about half the size as in the full FFT, but it is only redundant parts that are left out. The result of the convolution is unchanged.
I'm am working on a project at the moment where I am trying to create a Hilbert curve using the Python Imaging Library. I have created a function which will generate new coordinates for the curve through each iteration and place them into various lists which then I want to be able to move, rotate and scale. I was wondering if anyone could give me some tips or a way to do this as I am completely clueless. Still working on the a lot of the code.
#! usr/bin/python
import Image, ImageDraw
import math
# Set the starting shape
img = Image.new('RGB', (1000, 1000))
draw = ImageDraw.Draw(img)
curve_X = [0, 0, 1, 1]
curve_Y = [0, 1, 1, 0]
combinedCurve = zip(curve_X, curve_Y)
draw.line((combinedCurve), fill=(220, 255, 250))
iterations = 5
# Start the loop
for i in range(0, iterations):
# Make 4 copies of the curve
copy1_X = list(curve_X)
copy1_Y = list(curve_Y)
copy2_X = list(curve_X)
copy2_Y = list(curve_Y)
copy3_X = list(curve_X)
copy3_Y = list(curve_Y)
copy4_X = list(curve_X)
copy4_Y = list(curve_Y)
# For copy 1, rotate it by 90 degree clockwise
# Then move it to the bottom left
# For copy 2, move it to the top left
# For copy 3, move it to the top right
# For copy 4, rotate it by 90 degrees anticlockwise
# Then move it to the bottom right
# Finally, combine all the copies into a big list
combinedCurve_X = copy1_X + copy2_X + copy3_X + copy4_X
combinedCurve_Y = copy1_Y + copy2_Y + copy3_Y + copy4_Y
# Make the initial curve equal to the combined one
curve_X = combinedCurve_X[:]
curve_Y = combinedCurve_Y[:]
# Repeat the loop
# Scale it to fit the canvas
curve_X = [x * xSize for x in curve_X]
curve_Y = [y * ySize for y in curve_Y]
# Draw it with something that connects the dots
curveCoordinates = zip(curve_X, curve_Y)
draw.line((curveCoordinates), fill=(255, 255, 255))
img2=img.rotate(180)
img2.show()
Here is a solution working on matrices (which makes sense for this type of calculations, and in the end, 2D coordinates are matrices with 1 column!),
Scaling is pretty easy, just have to multiply each element of the matrix by the scale factor:
scaled = copy.deepcopy(original)
for i in range(len(scaled[0])):
scaled[0][i]=scaled[0][i]*scaleFactor
scaled[1][i]=scaled[1][i]*scaleFactor
Moving is pretty easy to, all you have to do is to add the offset to each element of the matrix, here's a method using matrix multiplication:
import numpy as np
# Matrix multiplication
def mult(matrix1,matrix2):
# Matrix multiplication
if len(matrix1[0]) != len(matrix2):
# Check matrix dimensions
print 'Matrices must be m*n and n*p to multiply!'
else:
# Multiply if correct dimensions
new_matrix = np.zeros(len(matrix1),len(matrix2[0]))
for i in range(len(matrix1)):
for j in range(len(matrix2[0])):
for k in range(len(matrix2)):
new_matrix[i][j] += matrix1[i][k]*matrix2[k][j]
return new_matrix
Then create your translation matrix
import numpy as np
TranMatrix = np.zeros((3,3))
TranMatrix[0][0]=1
TranMatrix[0][2]=Tx
TranMatrix[1][1]=1
TranMatrix[1][2]=Ty
TranMatrix[2][2]=1
translated=mult(TranMatrix, original)
And finally, rotation is a tiny bit trickier (do you know your angle of rotation?):
import numpy as np
RotMatrix = np.zeros((3,3))
RotMatrix[0][0]=cos(Theta)
RotMatrix[0][1]=-1*sin(Theta)
RotMatrix[1][0]=sin(Theta)
RotMatrix[1][1]=cos(Theta)
RotMatrix[2][2]=1
rotated=mult(RotMatrix, original)
Some further reading on what I've done:
http://en.wikipedia.org/wiki/Transformation_matrix#Affine_transformations
http://en.wikipedia.org/wiki/Homogeneous_coordinates
http://www.essentialmath.com/tutorial.htm (concerning all the algebra transformations)
So basically, it should work if you insert those operations inside your code, multiplying your vectors by the rotation / translation matrices
EDIT
I just found this Python library that seems to provide all type of transformations: http://toblerity.org/shapely/index.html