OpenCV: Understanding warpPerspective / perspective transform

OpenCV: Understanding warpPerspective / perspective transform - python

I made a small example for myself to play around with OpenCVs wrapPerspective, but the output is not completely as I expected.
My input is a bar at an 45° angle. I want to transform it so that it's vertically aligned / at an 90° angle. No problem with that. However, what I don't understand is that everything around the actual destination points is black. The reason I don't understand this is, that actually only the transformation matrix gets passed to the wrapPerspective function, not the destination points themselves. So my expected output would be a bar at an 90° angle and most around it to be yellow instead of black. Where's my error in reasoning?
# helper function
def showImage(img, title):
fig = plt.figure()
plt.suptitle(title)
plt.imshow(img)
# read and show test image
img = mpimg.imread('test_transform.jpg')
showImage(img, "input image")
# source points
top_left = [194,430]
top_right = [521,103]
bottom_right = [549,131]
bottom_left = [222,458]
pts = np.array([bottom_left,bottom_right,top_right,top_left])
# target points
y_off = 400; # y offset
top_left_dst = [top_left[0], top_left[1] - y_off]
top_right_dst = [top_left_dst[0] + 39.6, top_left_dst[1]]
bottom_right_dst = [top_right_dst[0], top_right_dst[1] + 462.4]
bottom_left_dst = [top_left_dst[0], bottom_right_dst[1]]
dst_pts = np.array([bottom_left_dst, bottom_right_dst, top_right_dst, top_left_dst])
# generate a preview to show where the warped bar would end up
preview=np.copy(img)
cv2.polylines(preview,np.int32([dst_pts]),True,(0,0,255), 5)
cv2.polylines(preview,np.int32([pts]),True,(255,0,255), 1)
showImage(preview, "preview")
# calculate transformation matrix
pts = np.float32(pts.tolist())
dst_pts = np.float32(dst_pts.tolist())
M = cv2.getPerspectiveTransform(pts, dst_pts)
# wrap image and draw the resulting image
image_size = (img.shape[1], img.shape[0])
warped = cv2.warpPerspective(img, M, dsize = image_size, flags = cv2.INTER_LINEAR)
showImage(warped, "warped")
The result using this code is:
Here's my input image test_transform.jpg:
And here is the same image with coordinates added:
By request, here is the transformation matrix:
[[ 6.05504680e-02 -6.05504680e-02 2.08289910e+02]
[ 8.25714275e+00 8.25714275e+00 -5.12245707e+03]
[ 2.16840434e-18 3.03576608e-18 1.00000000e+00]]

Your ordering in your arrays or their positions might be the fault. Check this Transformed Image: The dst_pts array is: np.array([[196,492],[233,494],[234,32],[196,34]]), thats more or less like the blue rectangle in your preview image.(I made the coordinates myself to make sure they are right)
NOTE: Your source and destination points should be in right order

Related

Interpreting the rotation matrix with respect to a defined World coordinates

In the image bellow, we see a defined world plane coordinate (X,Y,0) where Z=0. The camera as we can see is heading towards the defined world plane.
World reference point is located on the top left of the Grid (0,0,0). The distance between every two yellow point is 40 cm
I've calibrated my camera using the checkerboard and then used the built-in function cv2.solvePnP in order to estimate the rotation and translation vector of the camera with respect to my defined world coordinates. The results are as follows:
tvec_cam= [[-5.47884374]
[-3.08581371]
[24.15112048]]
rvec_cam= [[-0.02823308]
[ 0.08623225]
[ 0.01563199]]
According to the results, the (tx,ty,tz) seems to be right as the camera is located in the negative quarter of X,Y world-coordinates
However, i'm getting confused by interpreting the rotation vector.!
Does the resulted rotation vector say that the camera coordinates are almost aligned with the world coordinate axis (means almost no rotation!)?,
If yes how could this be true?, since according to OPENCV's camera coordinates, the Z-axis of the camera is pointing towards the scene (which means towards the world plane), the X-axis points towards the image write (which means opposite of X-world axis) and the Y-axis of the camera points towards the image bottom (which also means opposite of the Y-world axis)
Moreover, what is the unit of the tvec?
Note: I've illustrated the orientation of the defined world-coordinate axis according the the result of the translation vector (both tx and ty are negative)
the code i used for computing the rotation and translation vectors is shown below:
import cv2 as cv
import numpy as np
WPoints = np.zeros((9*3,3), np.float64)
WPoints[:,:2] = np.mgrid[0:9,0:3].T.reshape(-1,2)*0.4
imPoints=np.array([[20,143],[90,143],[161,143],[231,144],[303,144],
[374,144],[446,145],[516,146],[587,147],[18,214],[88,214]
,[159,215],[230,215],[302,216],[374,216],[446,216]
,[517,217],[588,217],[16,285],[87,285],[158,286],[229,287],
[301,288]
,[374,289],[446,289],[518,289],[589,289]],dtype=np.float64)
#load the rotation matrix [[4.38073915e+03 0.00000000e+00 1.00593352e+03]
# [0.00000000e+00 4.37829226e+03 6.97020491e+02]
# [0.00000000e+00 0.00000000e+00 1.00000000e+00]]
with np.load('parameters_cam1.npz') as X:
mtx, dist, _, _ = [X[i] for i in ('mtx','dist','rvecs','tvecs')]
ret,rvecs, tvecs = cv.solvePnP(WPoints, imPoints, mtx, dist)
np.savez("extrincic_camera1.npz",rvecs=rvecs,tvecs=tvecs)
print(tvecs)
print(rvecs)
cv.destroyAllWindows()
The code for estimating the intrinsic is show below
import numpy as np
import cv2
import glob
import argparse
import pathlib
ap = argparse.ArgumentParser()
ap.add_argument("-p", "--path", required=True, help="path to images folder")
ap.add_argument("-e", "--file_extension", required=False, default=".jpg",
help="extension of images")
args = vars(ap.parse_args())
path = args["path"] + "*" + args["file_extension"]
# termination criteria
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
# prepare object points, like (0,0,0), (0.03,0,0), (0.06,0,0) ....,
#(0.18,0.12,0)
objp = np.zeros((5*7,3), np.float32)
objp[:,:2] = np.mgrid[0:7,0:5].T.reshape(-1,2)*0.03
#print(objp)
# Arrays to store object points and image points from all the images.
objpoints = [] # 3d point in real world space
imgpoints = [] # 2d points in image plane.
#images = glob.glob('left/*.jpg') #read a series of images
images = glob.glob(path)
path = 'foundContours'
#pathlib.Path(path).mkdir(parents=True, exist_ok=True)
found = 0
for fname in images:
img = cv2.imread(fname) # Capture frame-by-frame
#print(images[im_i])
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Find the chess board corners
ret, corners = cv2.findChessboardCorners(gray, (7,5), None)
# print(corners)
# If found, add object points, image points (after refining them)
if ret == True:
print('true')
objpoints.append(objp) # Certainly, every loop objp is the same, in 3D.
#print('obj_point',objpoints)
corners2 = cv2.cornerSubPix(gray,corners,(11,11),(-1,-1),criteria)
# print(corners2)
imgpoints.append(corners2)
print(imgpoints)
print('first_point',imgpoints[0])
#print(imgpoints.shape())
# Draw and display the corners
img = cv2.drawChessboardCorners(img, (7,5), corners2, ret)
found += 1
cv2.imshow('img', img)
cv2.waitKey(1000)
# if you want to save images with detected corners
# uncomment following 2 lines and lines 5, 18 and 19
image_name = path + '/calibresult' + str(found) + '.jpg'
cv2.imwrite(image_name, img)
print("Number of images used for calibration: ", found)
# When everything done, release the capture
# cap.release()
cv2.destroyAllWindows()
#calibration
ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints,
gray.shape[::-1],None,None)
#save parameters needed in undistortion
np.savez("parameters_cam1.npz",mtx=mtx,dist=dist,rvecs=rvecs,tvecs=tvecs)
np.savez("points_cam1.npz",objpoints=objpoints,imgpoints=imgpoints)
print ("Camera Matrix = |fx 0 cx|")
print (" | 0 fy cy|")
print (" | 0 0 1|")
print (mtx)
print('distortion coefficients=\n', dist)
print('rotation vector for each image=', *rvecs, sep = "\n")
print('translation vector for each image=', *tvecs, sep= "\n")
Hope you could help me understanding this
Thanks in Advance

First, tvec is a in Axis-angle representation (https://en.wikipedia.org/wiki/Axis%E2%80%93angle_representation).
You can obtain the rotation matrix using cv2.Rodrigues(). For your data, I get almost the identity:
[[ 0.99616253 -0.01682635 0.08588995]
[ 0.01439347 0.99947963 0.02886672]
[-0.08633098 -0.02751969 0.99588635]]
Now, according to the directions of x and y in your picture, the z-axis points downwards (apply carefully the right-hand rule). This explains why the z-axis of the camera is almost aligned with the z-axis of your world reference frame.
Edit: Digging a little bit further, from the code you posted:
WPoints = np.zeros((9*3,3), np.float64)
WPoints[:,:2] = np.mgrid[0:9,0:3].T.reshape(-1,2)*0.4
The values for X and Y are all positive and increment to the right and to the bottom respectively, so you are indeed using the usual convention. You are actually using X and Y incrementing to the right and down respectively and what's wrong is only the arrows you drew in the picture.
Edit Concerning the interpretation of the translation vector: in the OpenCV convention, the points in the local camera reference frame are obtained from the points in the world reference frame like this:
|x_cam| |x_world|
|y_cam| = Rmat * |y_world| + tvec
|z_cam| |z_world|
With this convention, tvec is the position of the world origin in the camera reference frame. What's more easily interpretable is the position of the camera origin in the world reference frame, which can be obtained as:
cam_center = -(tvec * R_inv)
Where R_inv is the inverse of the rotation matrix. Here the rotation matrix is almost the identity, so a quick approximation would be -tvec, which is (5.4, 3.1, -24.1).

How to make a shape larger or smaller without changing the resolution of the image using OpenCV or PIL in Python

I would like to be able to make a certain shape in either a PIL image or an OpenCV image 3 times larger and smaller without changing the resolution of the image or changing the shape of the shape I want to make larger. I have tried using OpenCV's dilation method but that is not it's intended use, plus it changed the shape of the image. For an example:
Thanks.

Here's a way of doing it:
find the interesting shape, i.e. non-white ROI area
extract it
scale it up by a factor
clear the original image to white
paste the scaled ROI back into image with same centre
#!/usr/bin/env python3
import cv2
import numpy as np
if __name__ == "__main__":
# Open image
orig = cv2.imread('image.png',cv2.IMREAD_COLOR)
# Get extent of interesting part, i.e. non-white part
y, x, _ = np.nonzero(~orig)
y0, y1 = np.min(y), np.max(y) # top and bottom rows
x0, x1 = np.min(x), np.max(x) # left and right cols
h, w = y1-y0, x1-x0 # height and width
ROI = orig[y0:y1, x0:x1] # extract ROI
cv2.imwrite('ROI.png', ROI) # DEBUG only
# Upscale ROI
factor = 3
scaledROI = cv2.resize(ROI, (w*factor,h*factor), interpolation=cv2.INTER_NEAREST)
newH, newW = scaledROI.shape[:2]
# Clear original image to white
orig[:] = [255,255,255]
# Get centre of original shape, and position of top-left of ROI in output image
cx, cy = (x0 + x1) //2, (y0 + y1)//2
top = cy - newH//2
left = cx - newW//2
# Paste in rescaled ROI
orig[top:top+newH, left:left+newW] = scaledROI
cv2.imwrite('result.png', orig)
That transforms this:
to this:
Puts me in mind of a pantograph:

Opencv, how to overcrop an image?

I have a set of arbitrary images. Half the images are pictures, half are masks defining ROIS.
In the current version of my program I use the ROI to crop the image (i.e I extract the rectangle in the image matching the bounding box of the ROI mask). The problem is, the ROI mask isn't perfect and it's better to over predict than under predict in my case.
So I want to copy more than the ROI rectangle, but if I do this, I may be trying to crop out of the image.
i.e:
x, y, w, h = cv2.boundingRect(mask_contour)
img = img[int(y-h*0.05):int(y + h * 1.05), int(x-w*0.05):int(x + w * 1.05)]
can fail because it tries to access out of bounds pixels. I could just clamp the values, but I wanted to know if there is a better approach

You can add a boarder using OpenCV
import cv2 as cv
import random
src = cv.imread('/home/stephen/lenna.png')
borderType = cv.BORDER_REPLICATE
boarderSize = .5
top = int(boarderSize * src.shape[0]) # shape[0] = rows
bottom = top
left = int(boarderSize * src.shape[1]) # shape[1] = cols
right = left
value = [random.randint(0,255), random.randint(0,255), random.randint(0,255)]
dst = cv.copyMakeBorder(src, top, bottom, left, right, borderType, None, value)
cv.imshow('img', dst)
c = cv.waitKey(0)

Maybe you could try to limit the coordinates beforehand. Please see the code below:
[ymin, ymax] = [max(0,int(y-h*0.05)), min(h, int(y+h*1.05))]
[xmin, xmax] = [max(0,int(x-w*1.05)), min(w, int(x+w*1.05))]
img = img[ymin:ymax, xmin:xmax]

Python - Perspective transform for OpenCV from a rotation angle

I'm working on depth map with OpenCV. I can obtain it but it is reconstructed from the left camera origin and there is a little tilt of this latter and as you can see on the figure, the depth is "shifted" (the depth should be close and no horizontal gradient):
I would like to express it as with a zero angle, i try with the warp perspective function as you can see below but i obtain a null field...
P = np.dot(cam,np.dot(Transl,np.dot(Rot,A1)))
dst = cv2.warpPerspective(depth, P, (2048, 2048))
with :
#Projection 2D -> 3D matrix
A1 = np.zeros((4,3))
A1[0,0] = 1
A1[0,2] = -1024
A1[1,1] = 1
A1[1,2] = -1024
A1[3,2] = 1
#Rotation matrice around the Y axis
theta = np.deg2rad(5)
Rot = np.zeros((4,4))
Rot[0,0] = np.cos(theta)
Rot[0,2] = -np.sin(theta)
Rot[1,1] = 1
Rot[2,0] = np.sin(theta)
Rot[2,2] = np.cos(theta)
Rot[3,3] = 1
#Translation matrix on the X axis
dist = 0
Transl = np.zeros((4,4))
Transl[0,0] = 1
Transl[0,2] = dist
Transl[1,1] = 1
Transl[2,2] = 1
Transl[3,3] = 1
#Camera Intrisecs matrix 3D -> 2D
cam = np.concatenate((C1,np.zeros((3,1))),axis=1)
cam[2,2] = 1
P = np.dot(cam,np.dot(Transl,np.dot(Rot,A1)))
dst = cv2.warpPerspective(Z0_0, P, (2048*3, 2048*3))
EDIT LATER :
You can download the 32MB field dataset here: https://filex.ec-lille.fr/get?k=cCBoyoV4tbmkzSV5bi6. Then, load and view the image with:
from matplotlib import pyplot as plt
import numpy as np
img = np.load('testZ0.npy')
plt.imshow(img)
plt.show()

I have got a rough solution in place. You can modify it later.
I used the mouse handling operations available in OpenCV to crop the region of interest in the given heatmap.
(Did I just say I used a mouse to crop the region?) Yes, I did. To learn more about mouse functions in OpenCV SEE THIS. Besides, there are many other SO questions that can help you in this regard.:)
Using those functions I was able to obtain the following:
Now to your question of removing the tilt. I used the homography principal by taking the corner points of the image above and using it on a 'white' image of a definite size. I used the cv2.findHomography() function for this.
Now using the cv2.warpPerspective() function in OpenCV, I was able to obtain the following:
Now you can the required scale to this image as you wanted.
CODE:
I have also attached some snippets of code for your perusal:
#First I created an image of white color of a definite size
back = np.ones((435, 379, 3)) # size
back[:] = (255, 255, 255) # white color
Next I obtained the corner points pts_src on the tilted image below :
pts_src = np.array([[25.0, 2.0],[403.0,22.0],[375.0,436.0],[6.0,433.0]])
I wanted the points above to be mapped to the points 'pts_dst' given below :
pts_dst = np.array([[2.0, 2.0], [379.0, 2.0], [379.0, 435.0],[2.0, 435.0]])
Now I used the principal of homography:
h, status = cv2.findHomography(pts_src, pts_dst)
Finally I mapped the original image to the white image using perspective transform.
fin = cv2.warpPerspective(img, h, (back.shape[1],back.shape[0]))
# img -> original tilted image.
# back -> image of white color.
Hope this helps! I also got to learn a great deal from this question.
Note: The points fed to the 'cv2.findHomography()' must be in float.
For more info on Homography , visit THIS PAGE

opencv python - applying rotation matrix from rodrigues function

I am trying to simulate an image standing out of a marker. This is my code so far which does what is pictured. Essentially, I just want to rotate the image to appear to stand out orthogonal to the checkerboard.
As you can see, I use the code to find the transformation matrix between a normalized square image and the corresponding checkerboard corners. I then use warpPerspective to get the image you see. I know that I can use the rotation vectors from the solvePnP to obtain a rotation matrix through rodrigues() but I dont know what the next step is
def transformTheSurface(inputFrame):
ret, frameLeft = capleft.read()
capGray = cv2.cvtColor(frameLeft,cv2.COLOR_BGR2GRAY)
found, corners = cv2.findChessboardCorners(capGray, (5,4), None, cv2.CALIB_CB_NORMALIZE_IMAGE + cv2.CALIB_CB_ADAPTIVE_THRESH ) #,None,cv2.CALIB_CB_FAST_CHECK)
if (found):
npGameFrame = pygame.surfarray.array3d(inputFrame)
inputFrameGray = cv2.cvtColor(npGameFrame,cv2.COLOR_BGR2GRAY)
cv2.drawChessboardCorners(frameLeft, (5,4), corners, found)
q = corners[[0, 4, 15, 19]]
ret, rvecs, tvecs = cv2.solvePnP(objp, corners, mtx, dist)
ptMatrix = cv2.getPerspectiveTransform( muffinCoords, q)
npGameFrame = cv2.flip(npGameFrame, 0)
ptMatrixWithXRot = ptMatrix * rodRotMat[0]
#inputFrameConv = cv2.cvtColor(npGameFrame,cv2.COLOR_BGRA2GRAY)
transMuffin = cv2.warpPerspective(npGameFrame, ptMatrix, (640, 480)) #, muffinImg, cv2.INTER_NEAREST, cv2.BORDER_CONSTANT, 0)
EDIT:
I have added some more code, in hopes to create my own 3x3 transformation matrix. I used the following reference . Here is my code for that:
#initialization happens earlier in code
muffinCoords = np.zeros((4,2), np.float32)
muffinCoords[0] = (0,0)
muffinCoords[1] = (200,0)
muffinCoords[2] = (0,200)
muffinCoords[3] = (200,200)
A1 = np.zeros((4,3), np.float32)
A1[0] = (1,0,322)
A1[1] = (0,1,203)
A1[2] = (0,0,0)
A1[3] = (0,0,1)
R = np.zeros((4,4), np.float32)
R[3,3] = 1.0
T = np.zeros((4,4), np.float32)
T[0] = (1,0,0,0)
T[1] = (0,1,0,0)
T[2] = (0,0,1,0)
T[3] = (0,0,0,1)
#end initialization
#load calib data derived using cv2.calibrateCamera, my Fx and Fy are about 800
loadedCalibFileMTX = np.load('calibDataMTX.npy')
mtx = np.zeros((3,4), np.float32)
mtx[:3,:3] = loadedCalibFileMTX
#this is new to my code, creating what I interpret as Rx*Ry*Rz
ret, rvecCalc, tvecs = cv2.solvePnP(objp, corners, loadedCalibFileMTX, dist)
rodRotMat = cv2.Rodrigues(rvecCalc)
R[:3,:3] = rodRotMat[0]
#then I create T
T[0,3] = tvecs[0]
T[1,3] = tvecs[1]
T[2,3] = tvecs[2]
# CREATING CUSTOM TRANSFORM MATRIX
# A1 -> 2d to 3d projection matrix
# R-> rotation matrix as calculated by solve PnP, or Rx * Ry * Rz
# T -> converted translation matrix, reference from site, vectors pulled from tvecs of solvPnP
# mtx -> 3d to 2d matrix
# customTransformMat = mtx * (T * (R * A1)) {this is intended calculation of following}
first = np.dot(R, A1)
second = np.dot(T, first)
finalCalc = np.dot(mtx, second)
finalNorm = finalCalc/(finalCalc[2,2]) # to make sure that the [2,2] element is 1
transMuffin = cv2.warpPerspective(npGameFrame, finalNorm, (640, 480), None, cv2.INTER_NEAREST, cv2.BORDER_CONSTANT, 0)
#transMuffin is returned as undefined here, any help?
# using the cv2.getPerspectiveTransform method to find what you can find pictured at the top
ptMatrix = cv2.getPerspectiveTransform( muffinCoords, q)

I finally figured out the right methodology. you can find the code here https://github.com/mikezucc/augmented-reality-fighter-pygame
NOTICE:
Almost ALL of the game code is written by Leif Theiden, and is under license specified in the .py files. The code relevant to the computer vision is in states.py. I used the game to just show that it can be done for others looking to get started in simple computer vision.
My code opens a thread everytime a new surface (PyGame for frame simply) is called to be displayed on the main window. I start a thread at that point and execute a simple computer vision function that does the following:
Searches a camera stream frame for the 5x4 chessboard (cv2.findChessboardCorners)
The found corners are then drawn onto the image
Using cv2.solvePnP, the approximate pose (Rotation and translation vectors) are derived
The 3d points that describe a square are then projected from the 3d space determined by step 3 into a 2d space. this is used to convert a predertimined 3d structure into something you can use to graph on a 2d image.
However, this step instead finds the transformation to get from a set of 2d square points (the dimensions of the game frame) to the newly found projected 2d points (of the 3d frame). Now you can see that what we are trying to is simply do a two step transformation.
I then perform a basic tutorial style addition of the captured stream frame and the transformed game frame to get a final image
Variables:
+from3dTransMatrix -> points of the projected 3d structure into 2d points. these are the red dots you see
+q -> this is the reference plane that we determine the pose from
+ptMatrix -> the final transformation, to transform the game frame to fit in the projected frame
Check out the screens in the topmost folder ;]
enjoy!

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.