affine transform (shear) calculate output shape - python

I want to apply an affine transformation (using scipy ndimage.affine_transform).
In particular I want to apply a shear in both dimensions so my transform matrix looks something like this:
transform = np.array([[1, degree_v, 0],
[degree_h, 1, 0],
[0, 0, 1]])
the resulting image received by
ndimage.affine_transform(img, transform)
is cropped (which makes sense) and as I understand I have to give the correct output_shape and offset to keep the image uncropped.
How do I calculate these values?


How to rotate individual columns of an image?

Supposing I have an image with motion artifacts such that each column of the image should be rotated by a known amount, what would be the best way to get an output image with motion corrected. For example, take a 2x3 image like
import numpy as np
image = np.array([[1, 2, 3],
[1, 2, 3]])
column_rotation = np.array([0, 45, -45]) # degrees by which to rotate each column
rotation_pivot = np.array([0, 0, 0])
assert dewarp(image, column_rotation, rotation_pivot) == np.array([[1, 2, 3],
[1, 3, 2]])
Notice how the first column remained unchanged, second column was rotated around the 0th element (item in first row) by 45 degrees, and the third column was rotated around the 0th element by -45 degrees.
The best approach I have so far is to use griddata.
def dewarp(image, rotation_degrees, rotation_pivot) --> np.ndarray:
coordinates = get_each_pixel_coordinates(rotation_degrees, rotation_pivot)
return griddata(image.flatten(), coordinates, np.meshgrid(np.arange(image.shape[0]), np.arange(image.shape[1])), method='nearest')
where get_pixel_coordinates() finds the pixel coordinate for each pixel in the original image.
My problem with this approach is that it's too slow for the image sizes I'm working with (which are actually 3D of shape (200, 500, 1000), but I would settle for a fast 2D solution).
If only griddata was implemented in cupyx.scipy with GPU support I suspect this approach would be fast enough.
griddata might also be suboptimal in that it doesn't take advantage of the fact that columns are rotated in bulk, and just treats each pixel as independently warped.

How to apply a transformation matrix to the plane defined by the origin and normal

I have a plane defined by the origin(point) and normal. I need to apply 4 by 4 transformation matrix to it. How to do this correctly?
This is a helpful wikipedia link.
In case where your you are dealing with a three dimensional space, a 4 by 4 transformation matrix is probably a presentation of an affine transformation.
Check this wikipedia link.
to apply this transformation, you would first represent the plane using a 4x1 homogeneous representation (x, y, z, 1), where x, y, and z are the coordinates of a point on the plane, and the last component is 1 to indicate that the vector is a homogeneous vector.
Next, you would multiply this vector by the transformation matrix to obtain a new 4x1 vector, which represents the new position of the plane after the transformation.
the normal vector should not be affected by the translation part of the transformation matrix. This is because a normal vector represents the orientation of a surface and not its position, so it should not be affected by translation. thus the representation of the vector should be (x,y,z,0).
again, you would multiply this vector by the transformation matrix to obtain a new 4x1 vector, which represents the new orientation of the plane after the transformation.
only the top 3 elements of both the resulted vectors describe the new origin and the new normal (in-short the new plane).
This is an example in Python:
import numpy as np
# Original plane
o = np.array([0, 0, 0, 1])
n = np.array([0, 0, 1])
# Transformation matrix
T = np.array([[1, 0, 0, 2],
[0, 1, 0, 3],
[0, 0, 1, 4],
[0, 0, 0, 1]])
# Apply transformation to the origin
o_new = T # o
# Apply transformation to the normal
n_new = T[:3, :3] # n
print("New origin:", o_new[:3])
print("New normal:", n_new)
output :
New origin: [2 3 4]
New normal: [0 0 1]
Note: n_new = T[:3, :3] # n is the same as if n had its fourth element as 0 and then n_new = (T # n)[:3]

How to convert sparse to dense adjacency matrix?

I am trying to convert a sparse adjacency matrix/list that only contains the indices of the non-zero elements ([[rows], [columns]]) to a dense matrix that contains 1s at the indices and otherwise 0s. I found a solution using to_dense_adj from Pytorch geometric (Documentation). But this does not exactly what I want, since the shape of the dense matrix is not as expected. Here is an example:
sparse_adj = torch.tensor([[0, 1, 2, 1, 0], [0, 1, 2, 3, 4]])
So the dense matrix should be of size 5x3 (the second array "stores" the columns; with non-zero elements at (0,0), (1,1), (2,2),(1,3) and (0,4)) because the elements in the first array are lower or equal than 2.
dense_adj = to_dense(sparse_adj)[0]
outputs a dense matrix, but of shape (5,5). Is it possible to define the output shape or is there a different solution to get what I want?
Edit: I have a solution to convert it back to the sparse representation now that works
dense_adj = torch.sparse.FloatTensor(sparse_adj, torch.ones(5), torch.Size([3,5])).to_dense()
ind = dense_adj.nonzero(as_tuple=False).t().contiguous()
sparse_adj = torch.stack((ind[1], ind[0]), dim=0)
Or is there any alternative way that is better?
You can acheive this by first constructing a sparse matrix with torch.sparse then converting it to a dense matrix. For this you will need to provide torch.sparse.FloatTensor a 2D tensor of indices, a tensor of values as well as a output size:
sparse_adj = torch.tensor([[0, 1, 2, 1, 0], [0, 1, 2, 3, 4]])
torch.sparse.FloatTensor(sparse_adj, torch.ones(5), torch.Size([3,5])).to_dense()
You can get the size of the output matrix dynamically with
sparse_adj.max(axis=1).values + 1
So it becomes:
(sparse_adj.max(axis=1).values + 1).tolist())

What do I do with the fundamental matrix?

I am trying to reconstruct a 3d shape from multiple 2d images.
I have calculated a fundamental matrix, but now I don't know what to do with it.
I am finding multiple conflicting answers on stack overflow and academic papers.
For example, Here says you need to compute the rotation and translation matrices from the fundamental matrix.
Here says you need to find the camera matrices.
Here says you need to find the homographies.
Here says you need to find the epipolar lines.
Which is it?? (And how do I do it? I have read the H&Z book but I do not understand it. It says I can 'easily' use the 'direct formula' in result 9.14, but result 9.14 is neither easy nor direct to understand.)
Stack overflow wants code so here's what I have so far:
# let's create some sample data
Wpts = np.array([[1, 1, 1, 1], # A Cube in world points
[1, 2, 1, 1],
[2, 1, 1, 1],
[2, 2, 1, 1],
[1, 1, 2, 1],
[1, 2, 2, 1],
[2, 1, 2, 1],
[2, 2, 2, 1]])
Cpts = np.array([[0, 4, 0, 1], #slightly up
[4, 0, 0, 1],
[-4, 0, 0, 1],
[0, -4, 0, 1]])
Cangles = np.array([[0, -1, 0], #slightly looking down
[-1, 0, 0],
[1, 0, 0],
views = []
transforms = []
clen = len(Cpts)
for i in range(clen):
cangle = Cangles[i]
cpt = Cpts[i]
transform = cameraTransformMatrix(cangle, cpt)
newpts =, transform.T)
view = cameraView(newpts)
H = cv2.findFundamentalMat(views[0], views[1])[0]
## now what??? How do I recover the cube shape?
Edit: I do not know the camera parameters
Fundamental Matrix
At first, listen to the fundamental matrix song ;).
The Fundamental Matrix only shows the mathematical relationship between your point correspondences in 2 images (x' - image 2, x - image 1). "That means, for all pairs of corresponding points holds " (Wikipedia). This also means, that if you are having outlier or incorrect point correspondences, it directly affects the quality of your fundamental matrix.
Additionally, a similar structure exists for the relationship of point correspondences between 3 images which is called Trifocal Tensor.
A 3d reconstruction using exclusively the properties of the Fundamental Matrix is not possible because "The epipolar geometry is the intrinsic projective geometry between two views. It is
independent of scene structure, and only depends on the cameras’ internal parameters
and relative pose." (HZ, p.239).
Camera matrix
Refering to your question how to reconstruct the shape from multiple images you need to know the camera matrices of your images (K', K). The camera matrix is a 3x3 matrix composed of the camera focal lengths or principal distance (fx, fy) as well as the optical center or principal point (cx, cy).
You can derive your camera matrix using camera calibration.
Essential matrix
When you know your camera matrices you can extend your Fundamental Matrix to a Essential Matrix E.
You could say quite sloppy that your Fundamental Matrix is now "calibrated".
The Essential Matrix can be used to get the rotation (rotation matrix R) and translation (vector t) of your second image in comparison to your first image only up to a projective reconstruction. t will be a unit vector. For this purpose you can use the OpenCV functions decomposeEssentialMat or recoverPose (that uses the cheirality check) or read further detailed explanations in HZ.
Projection matrix
Knowing your translation and rotation you can build you projection matrices for your images. The projection matrix is defined as . Finally, you can use triangulation (triangulatePoints) to derive the 3d coordinates of your image points. I recommend using a subsequent bundle adjustment to receive a proper configuration. There is also a sfm module in openCV.
Since homography or epipolar line knowledge is not essentially necessary for the 3d reconstruction I did not explain these concepts.
With your fundamental matrix, you can determine the camera matrices P and P' in a canonical form as stated (HZ,pp254-256). From these camera matrices you can theoretically triangulate a projective reconstruction that differs to the real scene in terms of an unknown projective transformation.
It has to be noted that the linear triangulation methods aren't suitable for projective reconstruction as stated in
(HZ,Discussion,p313) ["...neither of these two linear methods is quite suitable for projective reconstruction, since they are not projective-invariant."]
and therefore, the mentioned recommended triangulation technique should be used to obtain valueable results (that is actually more work to implement).
From this projective reconstruction you could use self-calibration approaches that can work in some scenarios but will not yield the accuracy and robustness that you can obtain with a calibrated camera and the utilization of the essential matrix to compute the motion parameters.

How to perform iterative 2D operation on 4D numpy array

Let me preface this post by saying that I'm pretty new to Python and NumPy, so I'm sure I'm overlooking something simple. What I'm trying to do is image processing over a PGM (grayscale) file using a mask (a mask convolution operation); however, I don't want to do it using the SciPy all-in-one imaging processing libraries that are available—I'm trying to implement the masking and processing operations myself. What I want to do is the following:
Iterate a 3x3 sliding window over a 256x256 array
At each iteration, I want to perform an operation with a 3x3 image mask (array that consists of fractional values < 1 ) and the 3x3 window from my original array
The operation is that the image mask gets multiplied by the 3x3 window, and that the results get summed up into one number, which represents a weighted average of the original 3x3 area
This sum should get inserted back into the center of the 3x3 window, with the original surrounding values left untouched
However, the output of one of these operations shouldn't be the input of the next operation, so a new array should be created or the original 256x256 array shouldn't be updated until all operations have completed.
The process is sort of like this, except I need to put the result of the convolved feature back into the center of the window it came from:
So, in this above example, the 4 would go back into the center position of the 3x3 window it came from (after all operations had concluded), so it would look like [[1, 1, 1], [0, 4, 1], [0, 0, 1]] and so on for every other convolved feature obtained. A non-referential copy could also be made of the original and this new value inserted into that.
So, this is what I've done so far: I have a 256x256 2D numpy array which is my source image. Using as_strided, I convert it into a 4D numpy array of 3x3 slices. The main problem I'm facing is that I want to execute the operation I've specified over each slice. I'm able to perform it on one slice, but in npsum operations I've tried, it adds up all the slices' results into one value. After this, I either want to create a new 256x256 array with the results, in the fashion that I've described, or iterate over the original, replacing the middle values of each 3x3 window as appropriate. I've tried using ndenumerate to change just the same value (v, x, 1, 1) of my 4D array each time, but since the index returned from my 4D array is of the form (v, x, y, z), I can't seem to figure out how to only iterate through (v, x) and leave the last two parts as constants that shouldn't change at all.
Here's my code thus far:
import numpy as np
from numpy.lib import stride_tricks
# create 256x256 NumPy 2D array from image data and image size so we can manipulate the image data, then create a 4D array of strided windows
# currently, it's only creating taking 10 slices to test with
imageDataArray = np.array(parsedPGMFile.imageData, dtype=int).reshape(parsedPGMFile.numRows, parsedPGMFile.numColumns)
xx = stride_tricks.as_strided(imageDataArray, shape=(1, 10, 3, 3), strides=imageDataArray.strides + imageDataArray.strides)
# create the image mask to be used
mask = [1,2,1,2,4,2,1,2,1]
mask = np.array(mask, dtype=float).reshape(3, 3)/16
# this will execute the operation on just the first 3x3 element of xx, but need to figure out how to iterate through all elements and perform this operation individually on each element
result = np.sum(mask * xx[0,0])
Research from sources like,, and were very helpful (as well as SO), but they don't seem to address what I'm trying to do exactly (unless I'm missing something obvious). I could probably use a ton of for loops, but I'd rather learn how to do it using these awesome Python libraries we have. I also realize I'm combining a few questions together, but that's only because I have the sneaking suspicion that this can all be done very simply! Thanks in advance for any help!
When you need to multiply element-wise, then reduce with addition, think or np.einsum:
from numpy.lib.stride_tricks import as_strided
arr = np.random.rand(256, 256)
mask = np.random.rand(3, 3)
arr_view = as_strided(arr, shape=(254, 254, 3, 3), strides=arr.strides*2)
arr[1:-1, 1:-1] = np.einsum('ijkl,kl->ij', arr_view, mask)
Based on the example illustration:
In [1]: import numpy as np
In [2]: from scipy.signal import convolve2d
In [3]: image = np.array([[1,1,1,0,0],[0,1,1,1,0],[0,0,1,1,1],[0,0,1,1,0],[0,1,1,0,0]])
In [4]: m = np.array([[1,0,1],[0,1,0],[1,0,1]])
In [5]: convolve2d(image, m, mode='valid')
array([[4, 3, 4],
[2, 4, 3],
[2, 3, 4]])
And putting it back where it came from:
In [6]: image[1:-1,1:-1] = convolve2d(image, m, mode='valid')
In [7]: image
array([[1, 1, 1, 0, 0],
[0, 4, 3, 4, 0],
[0, 2, 4, 3, 1],
[0, 2, 3, 4, 0],
[0, 1, 1, 0, 0]])
