Hello I would like to find the 3D position of an object given 2 different views of an object.
Things that I can provide here are:
I can calculate the intrinsic matrix of each camera.
I also know the 2D coordinates of the objects see here.
Providing bounding boxes of the object
Things that I can not provide here are:
3D position or relative position of the 2 cameras.
3D position of the object.
Measurements of the object.
These are methods i may be able to use to obtain center coordinates corresponding to the camera and the intrinsic parameters.
# This function uses a custom trained fasterrcnn model to detect the object and
# the center of the objects is being calculated using the bounding boxes.
# For simplicity the centers are being hardcoded, since the object won't move
def calculateCenterAndBoundingBox(image):
boundingBox1 = [(715.329, 383.64413), (746.09143, 402.87524)]
boudingBox2 = [(303.78778, 391.57953), (339.4821, 412.69092)]
if image == 1:
return (730.7102, 393.2597), boundingBox1
return (321.63495, 402.13522), boudingBox2
#for simplicity reasons, both intrinsic cameras are the same
def calculateIntrinsic():
return [[512, 0.0, 512],
[0.0, 483.0443151, 364],
[0.0, 0.0, 1.0]]
I tried to determine the position of my object with 8-point-algorithm so I decided to create some feature keypoints with SIFT using this Implementation .
%matplotlib inline
import matplotlib.pylab as plt
import numpy as np
import pysift
import math
import cv2
def myPlot(img):
plt.figure(figsize=(15,20)) # display the output image
pathToImage1 = "testImage1.png"
c1, bb1 = calculateCenterAndBoundingBox(1)
originalImage1 = cv2.imread(pathToImage1)
img1 = cv2.imread(pathToImage1, 0)
originalImage1 = originalImage[math.floor(bb1[0][1]): math.floor(bb1[1][1]), math.floor(bb1[0][0]):math.floor(bb1[1][0])]
img1 = img1[math.floor(bb1[0][1]): math.floor(bb1[1][1]), math.floor(bb1[0][0]):math.floor(bb1[1][0])]
keypoints, descriptors = pysift.computeKeypointsAndDescriptors(img1)
pathToImage2 = "testImage2.png"
c2, bb2 = calculateCenterAndBoundingBox(2)
originalImage2 = cv2.imread(pathToImage2)
img2 = cv2.imread(pathToImage2 , 0)
originalImage2 = originalImage2[math.floor(bb2 [0][1]): math.floor(bb2 [1][1]), math.floor(bb2 [0][0]):math.floor(bb2 [1][0])]
img2 = img2[math.floor(bb2 [0][1]): math.floor(bb2 [1][1]), math.floor(bb2 [0][0]):math.floor(bb2 [1][0])]
keypoints, descriptors = pysift.computeKeypointsAndDescriptors(img2)
However I only got 1 feature keypoint instead of 8 or more
So appearently I can't use the 8-point algorithm in this case.
But I have no other Ideas how to solve this problem giving the constraints above.
Is it even possible to calculate the 3D position given only 2D points and intrinsic matrix of each camera?
I am trying to write a class that uses the Laplacian of Gaussian for blob detection. Here is the class that I wrote:
import cv2 as cv
import numpy as np
from sklearn.preprocessing import normalize
from scipy import ndimage
class Blob:
BlobStack = [[[[]]]]
def _getLGKernel(self,sx,sy):
kx = (int(sx - 0.8)*4+1)*2+1
ky = (int(sy - 0.8)*4+1)*2+1
gaussKernelX = np.float32(cv.getGaussianKernel(kx,sx))
gaussKernelY = np.float32(cv.getGaussianKernel(ky,sy))
d2GaussKernelX = cv.Laplacian(gaussKernelX,cv.CV_32F)[1:kx-1]
d2GaussKernelY = cv.Laplacian(gaussKernelY,cv.CV_32F)[1:ky-1]
d2GaussKernelX = normalize(d2GaussKernelX,axis=0)
d2GaussKernelY = normalize(d2GaussKernelY,axis=0)
return d2GaussKernelX, d2GaussKernelY
def __init__(self,img,mins,maxs,steps):
step = int((maxs-mins)/steps)
r = range(mins,maxs+step,step)
self.BlobStack = np.zeros((np.size(r),np.size(r),np.shape(img)[0],np.shape(img)[1]),dtype= np.float32)
for i,sx in enumerate(r):
for j,sy in enumerate(r):
kernx, kerny = self._getLGKernel(sx, sy)
self.BlobStack[i,j] = np.abs(cv.filter2D(img,cv.CV_64F,kernx,borderType = cv.BORDER_REPLICATE) \
+cv.filter2D(img,cv.CV_64F,np.transpose(kerny),borderType = cv.BORDER_REPLICATE))
def findLocalBlobs(self,sz):
LocalBlobs = np.where(ndimage.maximum_filter(self.BlobStack,size=sz) == self.BlobStack)
LocalBlobs = np.asarray(LocalBlobs).transpose()
return LocalBlobs
Right now I am using a test case with two circles of various darkness on a white background. I am getting many extra blobs that shouldn't be there. Some are at the minimum or maximum kernel size but some are at other kernel sizes. There are kernel sizes, which are sometimes the max kernel size in one direction but not always, that have a constant value in a large region that it is detecting a local minimum at. It is lower than the value of the actual blobs I want to detect but still well above what could be a floating point error. A straight forward filtering by the value of the LoG doesn't work because some of the 'phantom' blobs have a higher value than the circle with the lower contrast to the background.
I am more interested in help from a theoretical image processing prospective as this is just a project that I am using to learn from, but any help would be appreciated.
I am quite intrigued by the idea of a homography and try to get it to work at a minimal example with python and OpenCV. Yet, my tests do not pass and I am not quite sure why. I pass in a set of corresponding points into the findHomography function according to This
and then multiply the homography matrix to receive my new point.
so the idea behind it is to find the planar coordinate transformation and then transform the points with
X' = H#X
where X' are the new coordinates and X are the coordinates in the new coordinate frame.
Here is some minimal code example:
import cv2
import numpy as np
import matplotlib.pyplot as plt
points = np.array([
[675, 585],
[675, 1722],
[3155, 580],
[3162, 1722],
t_points = np.array([
[0, 8.23],
[23.77, 0],
[23.77, 8.23]
pt = np.array([675, 580+(1722-580)/2, 0])
pt_test = np.array([0,8.23/2, 0])
def get_h_matrix(src_list, dst_list):
src_pts = np.array(src_list).reshape(-1,1,2)
dst_pts = np.array(dst_list).reshape(-1,1,2)
H, mask = cv2.findHomography(src_pts, dst_pts)
return H
H = get_h_matrix(points, t_points)
transformed = H#pt
plt.scatter(t_points[:,0], t_points[:,1], color = 'blue')
plt.scatter(transformed[0], transformed[1], color = 'orange')
plt.scatter(pt_test[0], pt_test[1], color = 'green')
plt.scatter(points[:,0], points[:,1], color = 'blue')
plt.scatter(pt[0],pt[1], color = 'orange')
where the output corresponds to the following plot
Plot of the coordinate Transformation. We can see that the green point, where the transformed point actually should be, is not even close to the orange point, where the homography transformed the point to.
Maybe somebody can see the error in my train of thoughts.
Your help is kindly appreciated.
EDIT: I swaped the points array a few times, because I thought I made a mistake, but still the wrong transformation.
As Micka mentioned in the comments, the problem is the representation of the test points.
pt = [x,y,1]
instead of
pt = [x,y,0]
after the transformation, the homogeneous coordinates get transformed back by
pt' = pt'/pt'[2]
I appreciate the help.
I am trying to create a 2D log chromaticity plot in python with OpenCV. The same question was asked here
How to compute 2D log-chromaticity?
but it was never answered.
(ASIDE: A guess was made that the axes must be log instead of linear, but this is incorrect as the paper uses negative coordinates, and log axes cannot be negative. Also, I was desparate and tried plt.xscale('log') and plt.yscale('log'), but it didn't work).
This work is based off this paper:
(I re-mention it below)
My Code:
import numpy as np
import cv2
import os
import matplotlib.pyplot as plt
root = r'.\path\to\root'
root = r'my_img.jpg'
if __name__ == '__main__':
img = cv2.imread(os.path.join(root, fl))
cv2.imshow('Original', img)
b, g, r = cv2.split(img)
img_sum = np.sum(img, axis = 2) # NOTE: This dtype will be uint32.
# Each channel can be up to
# 255 (dtype = uint8), but
# since uint8 can only go up
# to 255, sum naturally uint32
# "Normalized" channels
# NOTE: np.ma is the masked array library. It automatically masks
# inf and nan answers from result
n_r = np.ma.divide(1.*r, g)
n_b = np.ma.divide(1.*b, g)
log_rg = np.ma.log( n_r )
log_bg = np.ma.log( n_b )
plt.scatter(l_rg, l_bg, s = 2)
plt.title('2D Log Chromaticity')
Color Checker Chart
My Log Chromaticity Plot
Expected Result:
Finlayson Log Chromaticity Plot
The expected result was taken from this paper ("Intrinsic Images by Entropy Minimization", by: Finlayson, G., et. al.):
(Paper also mentioned above)
Can you help me please?!
This is the closest I can figure. Reading through this:
I came across the sentence:
"Fig. 2(a) shows log-chromaticities for the 24 surfaces of a Macbeth ColorChecker Chart, (the six neutral patches all belong to the same
cluster). If we now vary the lighting and plot median values
for each patch, we see the curves in Fig. 2(b)."
If you look closely at the log-chromaticity plot, you see 19 blobs, corresponding to each of the 18 colors in the Macbeth chart, plus the sum of all the 6 grayscale targets in the bottom row:
Explanation of Log Chromaticities
With 1 picture, we can only get 1 point of each blob: We take the median value inside each target and plot it. To get plot from the paper, we would have to create multiple images with different lighting. We might be able to do this by varying the temperature of the image in an image editor.
For now, I just looked at the color patches in the original image and plotted the points:
The graph dots are not all in the same place as the paper, but I figure it's fairly close. Would someone please check my work to see if this makes sense?
There are many questions over here which checks if two images are "nearly" similar or not.
My task is simple. With OpenCV, I want to find out if two images are 100% identical or not.
They will be of same size but can be saved with different filenames.
You can use a logical operator like xor operator. If you are using python you can use the following one-line function:
def is_similar(image1, image2):
return image1.shape == image2.shape and not(np.bitwise_xor(image1,image2).any())
where shape is the property that shows the size of matrix and bitwise_xor is as the name suggests. The C++ version can be made in a similar way!
Please see #berak code.
Notice: The Python code works for any depth images(1-D, 2-D, 3-D , ..), but the C++ version works just for 2-D images. It's easy to convert it to any depth images by yourself. I hope that gives you the insight! :)
Doc: bitwise_xor
EDIT: C++ was removed. Thanks to #Micka and # berak for their comments.
the sum of the differences should be 0 (for all channels):
bool equal(const Mat & a, const Mat & b)
if ( (a.rows != b.rows) || (a.cols != b.cols) )
return false;
Scalar s = sum( a - b );
return (s[0]==0) && (s[1]==0) && (s[2]==0);
import cv2
import numpy as np
a = cv2.imread("picture1.png")
b = cv2.imread("picture2.png")
difference = cv2.subtract(a, b)
result = not np.any(difference)
if result is True:
print("Pictures are the same")
print("Pictures are different")
If they are same files except being saved in different file-names, you can check whether their Checksums are identical or not.
Importing the packages we’ll need — matplotlib for plotting, NumPy for numerical processing, and cv2 for our OpenCV bindings. Structural Similarity Index method is already implemented for us by scikit-image, so we’ll just use their implementation
# import the necessary packages
from skimage.measure import structural_similarity as ssim
import matplotlib.pyplot as plt
import numpy as np
import cv2
Then define the compare_images function which we’ll use to compare two images using both MSE and SSIM. The mse function takes three arguments: imageA and imageB, which are the two images we are going to compare, and then the title of our figure.
We then compute the MSE and SSIM between the two images.
We also simply display the MSE and SSIM associated with the two images we are comparing.
def mse(imageA, imageB):
# the 'Mean Squared Error' between the two images is the
# sum of the squared difference between the two images;
# NOTE: the two images must have the same dimension
err = np.sum((imageA.astype("float") - imageB.astype("float")) ** 2)
err /= float(imageA.shape[0] * imageA.shape[1])
# return the MSE, the lower the error, the more "similar"
# the two images are
return err
def compare_images(imageA, imageB, title):
# compute the mean squared error and structural similarity
# index for the images
m = mse(imageA, imageB)
s = ssim(imageA, imageB)
# setup the figure
fig = plt.figure(title)
plt.suptitle("MSE: %.2f, SSIM: %.2f" % (m, s))
# show first image
ax = fig.add_subplot(1, 2, 1)
plt.imshow(imageA, cmap = plt.cm.gray)
# show the second image
ax = fig.add_subplot(1, 2, 2)
plt.imshow(imageB, cmap = plt.cm.gray)
# show the images
Load images off disk using OpenCV. We’ll be using original image, contrast adjusted image, and our Photoshopped image
We then convert our images to grayscale
# load the images -- the original, the original + contrast,
# and the original + photoshop
original = cv2.imread("images/jp_gates_original.png")
contrast = cv2.imread("images/jp_gates_contrast.png")
shopped = cv2.imread("images/jp_gates_photoshopped.png")
# convert the images to grayscale
original = cv2.cvtColor(original, cv2.COLOR_BGR2GRAY)
contrast = cv2.cvtColor(contrast, cv2.COLOR_BGR2GRAY)
shopped = cv2.cvtColor(shopped, cv2.COLOR_BGR2GRAY)
We will generate a matplotlib figure, loop over our images one-by-one, and add them to our plot. Our plot is then displayed to us.
Finally, we can compare our images together using the compare_images function.
# initialize the figure
fig = plt.figure("Images")
images = ("Original", original), ("Contrast", contrast), ("Photoshopped", shopped)
# loop over the images
for (i, (name, image)) in enumerate(images):
# show the image
ax = fig.add_subplot(1, 3, i + 1)
plt.imshow(image, cmap = plt.cm.gray)
# show the figure
# compare the images
compare_images(original, original, "Original vs. Original")
compare_images(original, contrast, "Original vs. Contrast")
compare_images(original, shopped, "Original vs. Photoshopped")
Reference- https://www.pyimagesearch.com/2014/09/15/python-compare-two-images/
I have done this task.
Compare file sizes.
Compare exif data.
Compare first 'n' byte, where 'n' is 128 to 1024 or so.
Compare last 'n' bytes.
Compare middle 'n' bytes.
Compare checksum
So I have an array (it's large - 2048x2048), and I would like to do some element wise operations dependent on where they are. I'm very confused how to do this (I was told not to use for loops, and when I tried that my IDE froze and it was going really slow).
Onto the question:
h = aperatureimage
h[:,:] = 0
indices = np.where(aperatureimage>1)
for True in h:
h[index] = np.exp(1j*k*z)*np.exp(1j*k*(x**2+y**2)/(2*z))/(1j*wave*z)
So I have an index, which is (I'm assuming here) essentially a 'cropped' version of my larger aperatureimage array. *Note: Aperature image is a grayscale image converted to an array, it has a shape or text on it, and I would like to find all the 'white' regions of the aperature and perform my operation.
How can I access the individual x/y values of index which will allow me to perform my exponential operation? When I try index[:,None], leads to the program spitting out 'ValueError: broadcast dimensions too large'. I also get array is not broadcastable to correct shape. Any help would be appreciated!
One more clarification: x and y are the only values I would like to change (essentially the points in my array where there is white, z, k, and whatever else are defined previously).
I'm not sure the code I posted above is correct, it returns two empty arrays. When I do this though
index = (aperatureimage==1)
print len(index)
Actually, nothing I've done so far works correctly. I have a 2048x2048 image with a 128x128 white square in the middle of it. I would like to convert this image to an array, look through all the values and determine the index values (x,y) where the array is not black (I only have white/black, bilevel image didn't work for me). I would then like to take all the values (x,y) where the array is not 0, and multiply them by the h[index] value listed above.
I can post more information if necessary. If you can't tell, I'm stuck.
EDIT2: Here's some code that might help - I think I have the problem above solved (I can now access members of the array and perform operations on them). But - for some reason the Fx values in my for loop never increase, it loops Fy forever....
import sys, os
from scipy.signal import *
import numpy as np
import Image, ImageDraw, ImageFont, ImageOps, ImageEnhance, ImageColor
def createImage(aperature, type):
imsize = aperature*8
middle = imsize/2
im = Image.new("L", (imsize,imsize))
draw = ImageDraw.Draw(im)
box = ((middle-aperature/2, middle-aperature/2), (middle+aperature/2, middle+aperature/2))
import sys, os
from scipy.signal import *
import numpy as np
import Image, ImageDraw, ImageFont, ImageOps, ImageEnhance, ImageColor
def createImage(aperature, type):
imsize = aperature*8 #Add 0 padding to make it nice
middle = imsize/2 # The middle (physical 0) of our image will be the imagesize/2
im = Image.new("L", (imsize,imsize)) #Make a grayscale image with imsize*imsize pixels
draw = ImageDraw.Draw(im) #Create a new draw method
box = ((middle-aperature/2, middle-aperature/2), (middle+aperature/2, middle+aperature/2)) #Bounding box for aperature
if type == 'Rectangle':
draw.rectangle(box, fill = 'white') #Draw rectangle in the box and color it white
del draw
return im, middle
def Diffraction(aperaturediameter = 1, type = 'Rectangle', z = 2000000, wave = .001):
# Constants
deltaF = 1/8 # Image will be 8mm wide
z = 1/3.
wave = 0.001
k = 2*pi/wave
# Now let's get to work
aperature = aperaturediameter * 128 # Aperaturediameter (in mm) to some pixels
im, middle = createImage(aperature, type) #Create an image depending on type of aperature
aperaturearray = np.array(im) # Turn image into numpy array
# Fourier Transform of Aperature
Ta = np.fft.fftshift(np.fft.fft2(aperaturearray))/(len(aperaturearray))
# Transforming and calculating of Transfer Function Method
H = aperaturearray.copy() # Copy image so H (transfer function) has the same dimensions as aperaturearray
H[:,:] = 0 # Set H to 0
U = aperaturearray.copy()
U[:,:] = 0
index = np.nonzero(aperaturearray) # Find nonzero elements of aperaturearray
H[index[0],index[1]] = np.exp(1j*k*z)*np.exp(-1j*k*wave*z*((index[0]-middle)**2+(index[1]-middle)**2)) # Free space transfer for ap array
Utfm = abs(np.fft.fftshift(np.fft.ifft2(Ta*H))) # Compute intensity at distance z
# Fourier Integral Method
apindex = np.nonzero(aperaturearray)
U[index[0],index[1]] = aperaturearray[index[0],index[1]] * np.exp(1j*k*((index[0]-middle)**2+(index[1]-middle)**2)/(2*z))
Ufim = abs(np.fft.fftshift(np.fft.fft2(U))/len(U))
# Save image
fim = Image.fromarray(np.uint8(Ufim))
ftfm = Image.fromarray(np.uint8(Utfm))
print "that may have worked..."
if __name__ == '__main__':
You'll need numpy, scipy, and PIL to work with this code.
When I run this, it goes through the code, but there is no data in them (everything is black). Now I have a real problem here as I don't entirely understand the math I'm doing (this is for HW), and I don't have a firm grasp on Python.
U[index[0],index[1]] = aperaturearray[index[0],index[1]] * np.exp(1j*k*((index[0]-middle)**2+(index[1]-middle)**2)/(2*z))
Should that line work for performing elementwise calculations on my array?
Could you perhaps post a minimal, yet complete, example? One that we can copy/paste and run ourselves?
In the meantime, in the first two lines of your current example:
h = aperatureimage
h[:,:] = 0
you set both 'aperatureimage' and 'h' to 0. That's probably not what you intended. You might want to consider:
h = aperatureimage.copy()
This generates a copy of aperatureimage while your code simply points h to the same array as aperatureimage. So changing one changes the other.
Be aware, copying very large arrays might cost you more memory then you would prefer.
What I think you are trying to do is this:
import numpy as np
N = 2048
M = 64
a = np.zeros((N, N))
x,y = np.meshgrid(np.linspace(0, 1, N), np.linspace(0, 1, N))
b = a.copy()
indices = np.where(a>0)
b[indices] = np.exp(x[indices]**2+y[indices]**2)
Or something similar. This, in any case, sets some values in 'b' based on the x/y coordinates where 'a' is bigger than 0. Try visualizing it with imshow. Good luck!
Concerning the edit
You should normalize your output so it fits in the 8 bit integer. Currently, one of your arrays has a maximum value much larger than 255 and one has a maximum much smaller. Try this instead:
fim = Image.fromarray(np.uint8(255*Ufim/np.amax(Ufim)))
ftfm = Image.fromarray(np.uint8(255*Utfm/np.amax(Utfm)))
Also consider np.zeros_like() instead of copying and clearing H and U.
Finally, I personally very much like working with ipython when developing something like this. If you put the code in your Diffraction function in the top level of your script (in place of 'if __ name __ &c.'), then you can access the variables directly from ipython. A quick command like np.amax(Utfm) would show you that there are indeed values!=0. imshow() is always nice to look at matrices.