I'am currently working on lidar and camera fusion for object detection, distance and size estimation. I'am struggling with the width and height estimation using lidar data ( x and y coordinates ) .
i need help with a method that makes use of all the info extracted from the lidar sensor to estimate the object's size!
NB :1- the bbox are provided by the yolov5 algorithm.
2- I have calculated the actual distance of each object inside a bbox.
height and width of the cyclist in the image attached : enter image description here
This is geometry around the "pinhole camera" model.
Let's first regard the unit circle.
Picture from debraborkovitz.com
Your camera is at the origin looking at B. The object is the line BC. Let's say it is 1 meter away (O-B), and 0.5 meters tall (B-C). It spans a certain angle of the unit circle (your view). Let's call that alpha (or theta, doesn't matter).
tan(alpha) * 1.0 m = 0.5 m
tan(alpha) * distance[m] = length[m]
tan(alpha) = length[m] / distance[m]
alpha isn't important, but tan(alpha) is, because it's proportional to the object's length. Only keep that in mind.
Focal length is just a factor, describing image resolution. Say f = 1000 px, then this object would be 500 px tall because
length[px] = f[px] * tan(alpha)
= f[px] * length[m] / distance[m]
Now, if lidar says the object is 5 m away, and the image says the object is 300 px tall/wide, you calculate
length[px] = f[px] * length[m] / distance[m]
rearrange
length[m] = length[px] / f[px] * distance[m]
length[m] = 300px / 1000px * 5m
length[m] = 1.5 m
You need to know the focal length (in pixels) for your camera. That is either given by the manufacturer, somewhere in the documentation, or you have to calculate it. There are calibration methods available. You can also calculate it from manual measurements.
If you need to estimate it, you can just place a yard stick at a known distance, take a picture, measure its length in pixels, and use the previous equations to evaluate:
f[px] = length[px] * distance[m] / length[m]
If you knew the sensor's pixel pitch, let's say 1.40 µm/px, and the true focal distance (not 35mm-equivalent), let's say 4.38 mm, then f[px] = 4.38 mm / (1.40 µm/px) = 3128 px. Those values are roughly representative of smartphone cameras and some webcams.
Related
I'm working on spatial frequency filtering using code from this site.
https://www.djmannion.net/psych_programming/vision/sf_filt/sf_filt.html
There is similar code here on stack exchange. What I was wondering though is how to convert the cutoff used in the Butterworth filter, which is a number from 0 to 1, to cycles / degree in the image when I report it. I feel like I'm missing something obvious. I'm imagining it has to do with the visual angle the image subtends and the resolution.
The Butterworth filter commonly used in psychology software to filter spatial frequencies from images is typically used as a low pass or high pass filter with a value given between 0 and 1 for the cutoff. That value will be the 50% cut point for the filter. For example, if it's low pass 50% or more of the frequencies below the cutoff will remain in the image and 50% or fewer above the cutoff will remain. The opposite is true for the high pass setting. It does have a steep rise to the function near the cutoff so it's good to use the cutoff to describe your filtered image.
So, what is the cutoff value and what does it mean? It's just a proportion of the max frequency in cycles/pixel. Once you know that max it's easy to derive, and it turns out the max is a constant. Suppose your image is 128x128. In 128 pixels you could have a maximum frequency of 64 cycles, or 0.5 cycles/pixel. Looking at fftfreq function in numpy or matlab reveals the max is always a 0.5 cycles/pixel. That makes sense because you need two pixels for a period. This means that a cutoff of 0.2 is going to be 0.1 cycles/pixel. Whatever cutoff you pick you can just divide in half and that's the cycles/pixel. Than all you need to do is scale that to cycles/degree.
Cycles / pixel can be converted to cycles / degree of visual angle once you know the size of the image presented in degrees of visual angle. For example, if someone's eye is d distance from the image of h height then the image will span 2 * atan( (h/2) / d) degrees of visual angle (assuming your atan function reports degrees and not radians, you may have to convert). Take the number of pixels in your image and divide it by the total span in degrees to get pixels / degree. Then multiply frequency (cycles / pixel) by pixels / degree to get cycles / degree.
Pseudo code equation version below:
cycles/pixel = cutoff * 0.5
d = distance_from_image_in_cm
h = height_of_image_in_cm
degrees_of_visual_angle = 2 * atan( (h/2) / d)
pixels/degree = total_pixels / degrees_of_visual_angle
cycles/degree = cycles/pixel * pixels/degree
Got an image with a chessboard of known size (the cyan line is 2cm long)
the naive way of determining the FOV would be like this:
catX = x1 - x0
catY = y1 - y0
hypoPx = sqrt(catX ** 2 + catY ** 2)
pxRatio = hypoPx / 200 # pixels/mm
pxHeight, pxWidth = img.shape[:2]
width, height = width / pxRatio, height / pxRatio
But it doesn't account for the perspective distortion.
So I got its rotation and transform vectors using solvePnPRansac (the axes on the image illustrate its orientation correctly).
I suppose it should be enough data to determine the field of view in mm almost precisely, but could not move further, I'm not very good at matrices and stuff ... Any hints?
fov is atan(object size/distance to object). You will have 2 fov's: fov_x and fov_y. Calibration object must be parallel to sensor plane.
I have the follow situation:
One point located on Earth Surface with 3D coordinates (X, Y, Z) and one camera inside the airplane that taken picture from surface. For the camera, I have too the 3D coordinates (X, Y, Z) for the exactly moment that the image was taken.
To this scenario I need calculate the light reflection angle between the point on Earth surface and the camera inside the airplane.
I would like suggestions or ideias to calculate this angle. I know that a possible solution will use the analytical geometry.
I have calculated the sun incidence angle to the point on surface using PVLIB library, but I can't found on pvlib a function to determine the light reflection angle.
Thx for help me!!
I suppose that you used the sun elevation and azimuth angle to calculate the sun incidence vector by some formula such as (suppose azimuth as [N=0 / E=90 / S=180 / W=270]):
Vx_s = sin(sun_azim) * cos(sun_elev)
Vy_s = cos(sun_azim) * cos(sun_elev)
Vz_s = sin(sun_elev)
Considering a light reflection on a flat surface (horizontal with normal vector to zenith), the vector of reflected light (forward light, not considering scattering/dispersion rays, e.g. mirror surface) will be
Vx_r = sin(sun_azim + 180) * cos(sun_elev)
Vy_r = cos(sun_azim + 180) * cos(sun_elev)
Vz_r = sin(sun_elev)
The vector of the plane camera is:
Vx_p = X_plane - X_surface
Vy_p = Y_plane - Y_surface
Vz_p = Z_plane - Z_surface
Then, the angle between the reflected ray and the airplane camera is (take into account that the plane-site vector is not an unit vector in this example):
alpha = arccos( (Vx_p*Vx_r + Vy_p*Vy_r + Vz_p*Vz_r) / sqrt(Vx_p**2 + Vy_p**2 + Vz_p**2) )
I am trying to learn about Perlin Noise and procedural generation. I am reading through an online tutorial about generating landscapes with noise, but I don't understand part of the author's explanation about making areas with higher elevation.
On this webpage under the "islands" section there is the text
Design a shape that matches what you want from islands. Use the lower shape to push the map up and the upper shape to push the map down. These shapes are functions from distance d to elevation 0-1. Set e = lower(d) + e * (upper(d) - lower(d)).
I want to do this, but I'm not sure what the author means when they're talking about upper and lower shapes.
What could the author mean by "Use the lower shape to push the map up and the upper shape to push the map down"?
Code Example:
from __future__ import division
import numpy as np
import math
import noise
def __noise(noise_x, noise_y, octaves=1, persistence=0.5, lacunarity=2):
"""
Generates and returns a noise value.
:param noise_x: The noise value of x
:param noise_y: The noise value of y
:return: numpy.float32
"""
value = noise.pnoise2(noise_x, noise_y,
octaves, persistence, lacunarity)
return np.float32(value)
def __elevation_map():
elevation_map = np.zeros([900, 1600], np.float32)
for y in range(900):
for x in range(1600):
noise_x = x / 1600 - 0.5
noise_y = y / 900 - 0.5
# find distance from center of map
distance = math.sqrt((x - 800)**2 + (y - 450)**2)
distance = distance / 450
value = __noise(noise_x, noise_y, 8, 0.9, 2)
value = (1 + value - distance) / 2
elevation_map[y][x] = value
return elevation_map
The author means that you should describe the final elevation, fe, of a point in terms of its distance from the centre, d, as well as the initial elevation, e, which was presumably generated by noise.
So, for example, if you wanted your map to look something like a bowl, but maintaining the noisy characteristic of your originally generated terrain, you could use the following functions:
def lower(d):
# the lower elevation is 0 no matter how near you are to the centre
return 0
def upper(d):
# the upper elevation varies quadratically with distance from the centre
return d ** 2
def modify(d, initial_e):
return lower(d) + initial_e * (upper(d) - lower(d))
Note in particular the paragraph starting with "How does this work?", which I found quite illuminating.
I have an image of a grid of holes. Processing it with numpy.fft.fft2 yields a nice image where I can clearly see periodicity, base vectors etc.
But how can I extract the lattice spacing?
The lattice points in real-space have a spacing of about 96px, so the spacing in k-space would be 2*Pi / 96px = 0.065 1/px.
Naturally, numpy can't return an image array with sub-pixel spacing, so it is somehow scaled - spacing in k-space is about 70px.
But how is the scaling done and what is the exact scaling factor?
Units of numpy.fft.fft2's output frequency scale is in cycle/full-length/pixel, under the assumption that the input is periodic with a period corresponding to the full input length.
So, if you have an fft2 output with a size of 6720 x 6720 pixels and with a spike at the 70th pixel, you may expect a periodic component in the spatial domain with a period of:
1 / (70 pixels * 1 cycle / 6720 pixels / pixel) = 96 pixels/cycle.
Correspondingly, if you have an input image with a size of 6720 x 6720 pixels with elements that are repeating every 96 pixels, you will get a spike in the frequency domain at:
(1 / (96 pixels/cycle)) / (1 cycle / 6720 pixels / pixels) = 70 pixels.
While this is unit accurate, perhaps a simpler way to look at it is:
spatial-domain-period-in-pixels
= image-size-in-pixels / frequency-domain-frequency-in-pixels
frequency-domain-frequency-in-pixels =
= image-size-in-pixels / spatial-domain-period-in-pixels